THE  EFFECT  OF  MULTIDIMENSIONALITY  ON 

UNIDIMENSIONAL  EQUATING  WITH 

ITEM  RESPONSE  THEORY 


By 
PATRICIA  DUFFY  SPENCE 


A  DISSERTATION  PRESENTED  TO  THE  GRADUATE  SCHOOL 

OF  THE  UNIVERSITY  OF  FLORIDA  IN  PARTIAL  FULFILLMENT 

OF  THE  REQUIREMENTS  FOR  THE  DEGREE  OF 

DOCTOR  OF  PHILOSOPHY 

UNIVERSITY  OF  FLORIDA 

1996 


This  dissertation  is  dedicated  to  the  memory  of  my  father 

James  F.  Duffy 

1929-1992 


ACKNOWLEDGMENTS 

An  effort  of  this  magnitude  always  involves  many  people.  The  author 
wishes  to  especially  thank  the  chairman  of  her  committee,  Dr.  M.  David  Miller, 
for  his  dedication  and  inspiration.  Without  his  encouragement  and  good  humor, 
this  dissertation  would  not  have  been  possible.  The  author  would  also  like  to 
thank  her  committee  members  for  their  guidance  and  patience,  particularly  Dr. 
James  Algina  and  Dr.  Linda  Crocker.  Their  suggestions  were  always  correct,  if 
not  always  accepted.  Also,  without  the  inspiration  of  Dr.  Charles  Dziuban  of  the 
University  of  Central  Florida,  she  would  never  have  pursued  studies  in  this  field. 

In  addition,  the  author  recognizes  her  colleagues,  past  and  present,  at 
The  Psychological  Corporation,  Volusia  County  District  Schools,  and  the  Florida 
Department  of  Education  for  the  opportunities  to  apply  her  learning  in  practical 
situations.  Gratitude  is  offered  to  her  three  parents-Jim,  Joan,  and  Jeanne- 
who  stressed  the  importance  of  learning  and  doing  things  well.  Thanks  also  to 
special  friends:  Anne  Seraphine  for  debating  the  meaning  of  life  and  monotonic 
curves;  Nada  Stauffer  for  quiet  friendship;  George  Suarez  for  making  her  laugh; 
and  Carlos  Guffain  for  demanding  her  best.  But  the  author  is  most  indebted  and 
grateful  to  her  husband,  Verne,  who  has  supported  and  encouraged  her  through 
three  degrees,  and  her  daughter  Cindy  who  is  now  left  to  carry  on  the  Gator 
tradition  alone. 


TABLE  OF  CONTENTS 

page 

ACKNOWLEDGMENTS Hi 

LIST  OF  TABLES vii 

LIST  OF  FIGURES xi 

ABSTRACT xii 

CHAPTERS 

1  INTRODUCTION 1 

Purpose 3 

Limitations 4 

Significance  of  the  Study 4 

2  REVIEW  OF  LITERATURE 6 

Test  Equating 6 

Conditions  for  Equating 6 

Data  Collection  Designs 7 

Single-group  designs 7 

Equivalent-group  designs 9 

Anchor-test  designs 10 

Equating  Methods 14 

Conventional  Methods  of  Equating 14 

Linear  equating 16 

Equipercentile  equating 18 

Equating  Methods  Based  on  Item  Response  Theory 21 

Item  response  theory 21 

IRT  equating 28 


IV 


Multidimensionality 35 

Violation  of  the  Unidimensionality  Assumption 35 

Multidimensional  Models 37 

Multidimensionality  and  Parameter  Estimation 45 

Multidimensionality  and  IRT  equating 52 

METHOD 58 

Purpose    58 

Introduction 58 

Research  Questions 58 

Data  Generation 59 

Design 59 

Model  Description 60 

Item  Parameters 61 

Response  Data 63 

Noncompensatory  Data 65 

Nonrandom  Groups 66 

Estimation  of  Parameters 66 

tridimensional  IRT 66 

Analytical  Estimation 69 

Equating 69 

Concurrent  Calibration 70 

Equated  bs 70 

Characteristic  Curve  Transformation 71 

Evaluation  Criteria 73 

Comparison  Conditions 73 

Statistical  Criteria 75 

Summary 76 

RESULTS  AND  DISCUSSION 78 

Simulated  Data 78 

Item  Parameters 78 

Analytical  Estimation 88 

Simulated  Ability  Data 88 

Equating  Results  for  Randomly  Equivalent  Groups 92 

Concurrent  Calibration 92 

Equated  bs 99 

Characteristic  Curve  Transformation 103 

Equating  Results  for  Nonequivalent  Groups 103 

Concurrent  Calibration 103 

Equated  bs  and  Characteristic  Curve  Transformation 108 


5        CONCLUSIONS 111 

Effects  of  Multidimensional  Model 111 

Effects  of  Equating  Method 112 

Effects  of  the  Number  of  Multidimensional  Items 112 

Effects  of  Nonequivalent  Examinee  Groups 115 

Implications 116 

APPENDIX 

ITEM  PARAMETER  DATA 118 

REFERENCES 151 

BIOGRAPHICAL  SKETCH 158 


vi 


LIST  OF  TABLES 

Table  page 

1  Summary  of  Recommendations  for  a  Successful  Equating 15 

2  Summary  of  Unidimensional  IRT  Test  Equating  Studies 36 

3  Summary  of  Studies  of  Unidimensional  IRT  Estimation 

with  Multidimensional  Data 50 

4  Summary  of  Studies  of  Unidimensional  Equating  with 

Multidimensional  Data 57 

5  Simulated  Compensatory  Parameters  for  MD30,  Form  A 64 

6  Simulated  Noncompensatory  Parameters  for  Multidimensional 

Items,  MD30  Form  A 67 

7  Summary  Statistics  for  Multidimensional  Items  in  Compensatory 

and  Noncompensatory  Datasets 68 

8  Summation  of  Research  Equating  Conditions 72 

9  Analytical  Estimates  of  the  Unidimensional  Parameters  for 

Compensatory  MD30,  Form  A  74 

10  Descriptive  Statistics  for  Compensatory  Form  A  Item  Parameters  ...79 

1 1  Descriptive  Statistics  for  Compensatory  Form  B  Item  Parameters  ..80 

12  Descriptive  Statistics  for  Multidimensional  Item  Parameters  in 

Noncompensatory  Form  A 81 

13  Descriptive  Statistics  for  Multidimensional  Item  Parameters  in 

Noncompensatory  Form  B 82 

14  Descriptive  Statistics  for  Analytical  Unidimensional  Estimates  of 

Form  A  Item  Parameters  89 


1 5  Summary  Statistics  for  Analytical  Unidimensional  Estimates  of 

Form  B  Item  Parameters  90 

16  Descriptive  Statistics  for  Simulated  Examinees  Taking  MD10 91 

17  Descriptive  Statistics  for  Simulated  Examinees  Taking  MD20 92 

18  Descriptive  Statistics  for  Simulated  Examinees  Taking  MD30 93 

19  Descriptive  Statistics  for  Simulated  Examinees  Taking  MD40 94 

20  Descriptive  Statistics  for  Simulated  Low  Ability  Examinees 95 

21  Summary  of  Concurrent  Calibration  Results  with  Randomly 

Equivalent  Groups 96 

22  Constants  for  Equated  bs  Equating  of  Compensatory  Forms  with 

Randomly  Equivalent  Groups 100 

23  Constants  for  Equated  bs  Equating  of  Noncompensatory  Forms 

with  Randomly  Equivalent  Groups 101 

24  Summary  of  Equated  bs  Results  with  Randomly  Equivalent 

Groups 102 

25  Summary  of  Characteristic  Curve  Transformation  Results  with 

Randomly  Equivalent  Groups 104 

26  Summary  of  Equating  Results  with  Nonequivalent  Groups 1 06 

27  Constants  for  Equated  bs  Equating  of  Compensatory  Forms  with 

Nonequivalent  Examinee  Groups 109 

28  Simulated  Compensatory  Item  Parameters  for  MD10  Form  A 119 

29  Simulated  Compensatory  Item  Parameters  for  MD10  Form  B 120 

30  Simulated  Compensatory  Item  Parameters  for  MD20  Form  A 121 

31  Simulated  Compensatory  Item  Parameters  for  MD20  Form  B 122 

32  Simulated  Compensatory  Item  Parameters  for  MD30  Form  A 123 

33  Simulated  Compensatory  Item  Parameters  for  MD30  Form  B 124 


viii 


34  Simulated  Compensatory  Item  Parameters  for  MD40  Form  A 125 

35  Simulated  Compensatory  Item  Parameters  for  MD40  Form  B 126 

36  Noncompensatory  Item  Parameters  for  Multidimensional  Items 

in  MD10  FormsAand  B 127 

37  Noncompensatory  Item  Parameters  for  Multidimensional  Items 

in  MD20  Form  A 128 

38  Noncompensatory  Item  Parameters  for  Multidimensional  Items 

in  MD20  Form  B 129 

39  Noncompensatory  Item  Parameters  for  Multidimensional  Items 

inMD30FormA 130 

40  Noncompensatory  Item  Parameters  for  Multidimensional  Items 

in  MD30  Form  B 131 

41  Noncompensatory  Item  Parameters  for  Multidimensional  Items 

in  MD40  Form  A 132 

42  Noncompensatory  Item  Parameters  for  Multidimensional  Items 

in  MD40  Form  B 133 

43  Analytical  Estimates  of  Unidimensional  Item  Parameters  for 

MD10  Form  A 134 

44  Analytical  Estimates  of  Unidimensional  Item  Parameters  for 

MDIOFormB 135 

45  Analytical  Estimates  of  Unidimensional  Item  Parameters  for 

MD20  Form  A 136 

46  Analytical  Estimates  of  Unidimensional  Item  Parameters  for 

MD20FormB 137 

47  Analytical  Estimates  of  Unidimensional  Item  Parameters  for 

MD30  Form  A 138 

48  Analytical  Estimates  of  Unidimensional  Item  Parameters  for 

MD30FormB 139 

49  Analytical  Estimates  of  Unidimensional  Item  Parameters  for 

MD40  Form  A 140 


ix 


50  Analytical  Estimates  of  Unidimensional  Item  Parameters  for 

MD40  FormB 141 

51  Descriptive  Statistics  for  Compensatory  MD10  Linking  Items 

with  Randomly  Equivalent  Groups 142 

52  Descriptive  Statistics  for  Compensatory  MD20  Linking  Items 

with  Randomly  Equivalent  Groups 143 

53  Descriptive  Statistics  for  Compensatory  MD30  Linking  Items 

with  Randomly  Equivalent  Groups 144 

54  Descriptive  Statistics  for  Compensatory  MD40  Linking  Items 

with  Randomly  Equivalent  Groups 145 

55  Descriptive  Statistics  for  Noncompensatory  MD10  Linking  Items ...  146 

56  Descriptive  Statistics  for  Noncompensatory  MD20  Linking  Items ...  147 

57  Descriptive  Statistics  for  Noncompensatory  MD30  Linking  Items ...  1 48 

58  Descriptive  Statistics  for  Noncompensatory  MD40  Linking  Items ...  1 49 

59  Descriptive  Statistics  for  Compensatory  Linking  Items  with 

Nonequivalent  Groups 150 


LIST  OF  FIGURES 


Figure  page 

1  An  item  characteristic  curve  (ICC)  based  on  the  three- 

parameter  logistic  model 23 

2  An  item  response  surface  (IRS)  based  on  the  compensatory 

M2PL 40 

3  Item  response  surfaces  and  contour  plots  for  item  9,  MD20, 

a  =  20° 84 

4  Item  response  surfaces  and  contour  plots  for  item  10,  MD20, 

a  =  30° 85 

5  Item  response  surfaces  and  contour  plots  for  item  1 1 ,  MD20, 

a  =  45° 86 

6  Item  response  surfaces  and  contour  plots  for  item  12,  MD20, 

a  =  60° 87 


Abstract  of  Dissertation  Presented  to  the  Graduate  School 
of  the  University  of  Florida  in  Partial  Fulfillment  of  the 
Requirements  for  the  Degree  of  Doctor  of  Philosophy 

THE  EFFECT  OF  MULTIDIMENSIONALITY  ON 

UNIDIMENSIONAL  EQUATING  WITH 

ITEM  RESPONSE  THEORY 

by 

Patricia  Duffy  Spence 

May,  1996 

Chairman:  M.  David  Miller 

Major  Department:  Foundations  of  Education 

Test  publishers  apply  unidimensional  equating  techniques  to  their  products 
even  though  tests  are  expected  to  be  multidimensional  to  some  degree.  This 
simulation  study  investigated  the  effects  of  ignoring  multidimensional  data  in 
applying  unidimensional  item  response  theory  equating  procedures.  The 
specific  effects  studied  were  (a)  multidimensional  model,  (b)  type  of  equating 
procedure,  (c)  number  of  multidimensional  items,  and  (d)  distribution  of 
examinee  ability. 

Four  test  conditions  were  created  by  varying  the  number  of  multidimensional 
items  contained  in  each  test.  The  compensatory  multidimensional  two- 
parameter  logistic  model  was  selected  for  data  generation.  Four  degrees  of 


multidimensionality  were  spiraled  throughout  each  test.  The  data  were  then 
transformed  into  corresponding  noncompensatory  items  which  had  the  same 
probability  of  success  as  the  compensatory  item  for  a  given  examinee. 

Four  tests  with  40  items  each  were  simulated  with  12  common  linking  items 
and  28  unique  items.    For  each  experimental  condition  and  form,  responses  for 
1 ,000  simulees  were  generated.  To  examine  the  effects  of  nonrandom  groups, 
responses  for  1,000  less  able  examinees  were  also  generated. 

Three  unidimensional  IRT  equating  methods  were  selected:  (a) 
concurrent  calibration,  (b)  equated  bs,  and  (c)  characteristic  curve 
transformation.  Parameters  were  calibrated  with  BILOG386.  To  evaluate  the 
results  of  the  research  equatings,  three  comparison  conditions  were  used;  (1) 
the  unidimensional  approximations  of  the  multidimensional  item  parameters 
calculated  using  an  analytic  procedure;  (2)  the  simulated  first  ability  dimension 
only;  and  (3)  the  averages  of  the  two  simulated  abilities.  Three  statistical 
criteria-correlation,  standardized  differences  between  means,  and  standardized 
root  mean  square  difference-were  applied  to  the  data. 

No  significant  effect  on  the  unidimensional  equating  results  were 
attributed  to  choice  of  multidimensional  model.  For  randomly  equivalent  groups, 
there  was  also  no  effects  due  to  choice  of  equating  procedure.  Concurrent 
calibration  favored  low  ability  examinees  when  the  ability  distributions  of  the  two 
groups  were  unequal.  When  the  multidimensional  composites  described  by  the 
analytical  estimation  baseline  are  the  data  of  interest,  the  number  of 


Xill 


multidimensional  items  had  little  effect  on  the  unidimensional  equating  with 
randomly  equivalent,  normally  distributed  examinee  groups.  However,  if  the 
unidimensional  factor  is  the  trait  of  interest,  the  number  of  multidimensional 
items  affected  the  equating  outcomes,  with  results  deteriorating  as  the  number 
of  multidimensional  items  increased.  When  examinee  groups  were  not 
equivalent,  equating  results  were  affected  in  all  conditions.  Caution  is  advised 
in  applying  unidimensional  equating  procedures  when  the  examinee  groups  are 
suspected  of  being  from  different  ability  levels. 


CHAPTER  1 
INTRODUCTION 


In  many  large  testing  programs,  examinees  take  one  of  multiple  forms  of 
the  same  test.  Although  the  different  editions  are  constructed  to  be  as  similar  in 
content  and  difficulty  as  possible,  it  is  inevitable  that  some  differences  will  exist 
among  the  various  forms  (Petersen,  Cook,  &  Stocking,  1983).  Direct 
comparison  of  scores  would,  therefore,  be  unfair  to  an  examinee  who 
happened  to  take  a  more  difficult  form.  Because  examinees  are  often  in 
competition  or  are  being  directly  compared,  it  is  important  to  transform  the 
scores  in  some  way  to  make  them  equivalent. 

Equating  is  the  statistical  process  of  establishing  equivalent  raw  or  scaled 
scores  on  two  or  more  test  forms.  Theoretically,  the  equating  process  adjusts 
for  test  and  item  characteristics  so  the  propensity  distributions  would  be  the 
same  regardless  of  which  test  form  was  administered.  The  application  of 
equating  to  real  data,  however,  can  be  full  of  problems  and  complications 
(Skaggs  &  Lissitz,  1986a).  In  practice,  equating  requires  not  only  a  knowledge 
of  statistical  models,  but  awareness  and  consideration  of  many  other  issues  that 
have  practical  consequences  for  the  use  and  interpretation  of  results.  Brennan 
and  Kolen  (1987)  discussed  many  of  these  issues,  such  as  the  presence  of 
equating  errors,  specification  of  content,  and  security  breaches. 

1 


2 

Many  mathematical  procedures  have  emerged  to  develop  the  equating 
transformations.  Some  are  based  on  classical  test  theory  while  others  arise 
from  item  response  theory  (IRT).  Classical  methods,  including  linear  and 
equipercentile  equating,  do  not  seem  robust  to  departures  from  optimal 
conditions  (Cook  &  Eignor,  1983;  Livingston,  Dorans,  &  Wright,  1990;  Skaggs  & 
Lissitz,  1986b).  Item  response  theory  procedures,  including  equated  bs, 
concurrent  calibration,  and  characteristic  curve  transformation,  present 
alternatives.  Equating  methods  based  on  IRT  have  been  found  more  accurate 
than  those  based  on  classical  models  (Harris  &  Kolen,  1985;  Hills,  Subhiyah,  & 
Hirsch,  1988;  Kolen,  1981;  Marco,  Petersen,  &  Stewart,  1983;  Petersen,  Cook, 
&  Stocking,  1983). 

IRT  models  are  grounded  on  strong  assumptions,  particularly  that  the 
item  responses  are  unidimensional  (Ansley  &  Forsyth,  1985).  The 
unidimensionality  assumption  requires  that  each  of  the  tests  to  be  equated 
measures  the  same  underlying  ability.  Any  other  factor  that  influences  an 
examinee's  score-such  as  guessing,  speededness,  cheating,  item  context,  or 
instructional  sensitivity-will  violate  the  unidimensionality  assumption.  Some  of 
these  violations  can  be  controlled,  reduced,  or  eliminated,  but  the 
unidimensionality  assumption  will  still  be  violated  in  many  practical  testing 
situations  (Doody-Bogan  &  Yen,  1983). 

Attempts  have  been  made  to  model  multidimensional  responses  within 
the  framework  of  IRT.  Although  these  models  describe  multidimensional  data 
more  accurately  than  unidimensional  models,  estimation  of  parameters  is 


3 
complex  and  difficult  in  practice  (Harrison,  1986).  Test  companies  continue  to 
apply  unidimensional  equating  procedures  to  their  products.  The  viability  of 
using  unidimensional  models  with  multidimensional  data  must  be  explored  to 
determine  the  effect  on  the  equating  outcomes.  An  understanding  of  what  effect 
multidimensional  data  have  on  unidimensional  equating  results  is  of  paramount 
importance.  Empirical  studies  (Camilli,  Wang,  &  Fesq,  1995;  Cook  &  Eignor, 
1988;  Dorans  &  Kingston,  1985;  Yen,  1984)  indicate  that  violation  of  the 
unidimensionality  assumption,  while  having  some  impact  on  results,  may  not  be 
significant.  However,  each  of  these  studies  employed  data  from  a  different  test 
and  their  content  may  have  influenced  findings  in  an  unknown  manner.  The 
number  of  multidimensional  items  and  the  degree  of  multidimensionality  in  each 
is  also  unknown.  Therefore,  the  generalization  of  results  are  difficult  to  interpret 
across  studies  (Skaggs  &  Lissitz,  1986a).  It  is  necessary  to  design  research 
studies  that  permit  manipulation  of  independent  variables  to  understand  exactly 
how  violations  of  the  unidimensionality  assumption  affect  equating.  Simulation 
studies  present  a  technique  to  manipulate  and  control  the  desired  variables. 

Purpose 

The  purpose  of  the  present  study  was  to  investigate  the  effect  of 
multidimensional  data  in  applying  unidimensional  IRT  equating  techniques.  The 
specific  questions  to  be  answered  were: 

1 .        Does  the  number  of  multidimensional  items  affect  unidimensional 
equating  results? 


4 

2.  Does  the  equating  procedure  affect  unidimensional  equating 
results? 

3.  Do  data  simulated  by  using  a  compensatory  model  produce 
different  unidimensional  equating  results  than  data  simulated  by  using  a 
noncompensatory  model? 

4.  Are  unidimensional  equating  results  affected  by  the  ability 
distribution  of  the  two  examinee  groups? 

Limitations 

Results  of  this  study  are  applicable  only  to  the  research  conditions 
investigated.  Generalizations  to  other  item  response  theory  models  or  other 
equating  techniques  are  not  justified. 

Significance  of  the  Study 

In  practice,  test  publishers  today  apply  unidimensional  equating 
techniques  to  their  products.  Because  tests  are  expected  to  be 
multidimensional  to  some  degree  and  it  is  difficult  to  identify  multidimensionality 
accurately,  it  is  important  to  investigate  the  effect  of  applying  unidimensional 
equating  techniques  to  multidimensional  data.  Previous  studies  have  mainly 
explored  unidimensional  equating  with  empirical  data  that  was  suspected  of 
being  multidimensional.  Although  the  results  indicated  the  impact  of  violating 
the  unidimensionality  assumption  may  not  be  significant,  the  research  designs 
did  not  allow  manipulation  of  independent  variables.  In  addition,  the  true 
multidimensionality  of  the  underlying  data  was  unknown  in  these  empirical 
studies. 


5 
The  current  simulation  study  allowed  exploration  of  what  effect 
multidimensionality  had  on  the  results  obtained  from  a  variety  of  unidimensional 
equating  procedures  while  providing  a  means  to  manipulate  variables.  The 
techniques  used  to  generate  the  data  afforded  a  mechanism  to  control  the 
dimensionality  of  the  items  and  test  forms.  The  specific  questions  investigated 
were  selected  as  having  the  most  value  for  current  practitioners  applying 
unidimensional  equating  procedures. 


CHAPTER  2 
REVIEW  OF  LITERATURE 


Test  Equating 


Conditions  for  Equating 

The  purpose  of  equating  is  to  establish  a  relationship  between  two  test 
forms  so  that  it  becomes  a  matter  of  indifference  to  the  examinee  which  form  is 
taken.  Petersen,  Kolen,  and  Hoover  (1989)  stated  that  equating  itself  is  simply 
an  empirical  procedure  which  imposes  no  restrictions  on  the  properties  of  scores 
or  on  the  method  used  to  define  the  transformation.  It  is  only  when  the  purpose 
of  equating  and  the  definition  of  equivalent  scores  are  considered  that  restrictions 
become  necessary. 

Lord  (1980)  outlined  four  conditions  that  must  be  met  for  the  successful 
equating  of  two  test  forms,  X  and  Y.  Briefly,  the  conditions  are  (a)  equity,  (b) 
population  invariance,  (c)  symmetry,  and  (d)  same  ability.  To  satisfy  the  equity 
condition,  it  must  make  no  difference  to  examinees  at  every  ability  level,  6,  which 
form  of  the  test  is  taken.  The  conditional  frequency  distribution  fx  )0  of  the  score 
on  form  X  should  be  the  same  as  the  conditional  frequency  distribution  of  the 
transformed  form  Y  score,  fx(y)\o-  Lord  (1980)  added  that  it  is  not  sufficient  for 
equity  that  fK  p  and  fX(y) ,  B  have  the  same  means,  but  they  must  also  have  equal 
variances.  If  the  tests  are  not  equally  reliable,  it  is  no  longer  a  matter  of 

6 


7 
indifference  which  form  is  administered.  The  equity  condition  requires  the 
standard  error  of  measurement  and  the  higher  moments  to  be  the  same  after 
transformation  for  examinees  of  identical  ability.  To  fully  satisfy  this  requirement, 
test  forms  X  and  Y  must  be  strictly  parallel  (Kolen,  1981).  However,  if  this 
condition  is  met,  equating  is  no  longer  necessary. 

In  practice,  it  is  nearly  impossible  to  construct  multiple  forms  that  are 
strictly  parallel.  Therefore,  equating  is  needed.  Although  the  equity  condition  can 
never  be  met  precisely,  it  serves  to  keep  the  purpose  of  equating  in  mind  and 
guide  the  steps  in  the  process. 

The  population  invariance  and  symmetry  conditions  also  arise  from  the 
desire  to  achieve  equivalent  scores.  If  the  scores  from  form  X  and  form  Y  are 
equivalent,  there  is  a  one-to-one  relationship  between  the  two  sets  of  scores. 
The  transformation  must  be  unique,  independent  of  the  groups  used  to  derive  the 
conversion  (Petersen  et  al.,  1989).  The  purpose  of  equating  also  requires  that 
the  equating  function  be  invertible  or  symmetric.  The  equating  must  be  the  same 
regardless  of  which  test  is  labelled  X  and  which  test  is  labelled  Y  (Lord,  1980). 

The  two  tests  to  be  equated  must  also  measure  the  same  characteristic, 
whether  defined  as  a  latent  trait,  ability,  or  skill.  This  condition  distinguishes  true 
equating  from  scaling.  Scores  on  X  and  Y  can  always  be  placed  on  the  same 
scale,  but  they  must  measure  the  same  construct  to  be  considered  equated 
(Dorans,  1990). 


8 
It  is  unlikely  that  all  conditions  of  equating  can  be  met  in  practice. 
However,  good  approximations  to  this  ideal  can  be  achieved  and  are  usually 
fairer  to  examinees  than  if  no  attempt  at  equating  had  occurred  (Petersen  et  al., 
1989).  Research  conducted  over  the  past  20  years  serves  as  a  guide  in  the 
application  and  interpretation  of  equating  transformations. 
Data  Collection  Designs 

Every  equating  consists  of  two  parts-a  data  collection  design  and  an 
analytical  method  to  determine  the  appropriate  transformation.  Three  basic 
sampling  designs  are  most  frequently  described  in  the  literature  (Dorans,  1990; 
Dorans  &  Kingston,  1985;  Petersen  et  al.,  1989).  The  designs  are  classified  as 
(a)  single-group  designs,  (b)  equivalent-groups  designs,  and  (c)  anchor-test 
designs. 
Single-group  designs 

In  single-group  designs,  both  forms  or  tests  to  be  equated  are  given  to 
the  same  group  of  examinees.  The  difficulty  levels  of  the  tests  are  not 
confounded  with  the  differences  in  the  ability  levels  of  the  groups  taking  each 
test  because  the  examinees  are  the  same  (Hambleton  &  Swaminathan,  1985). 
However,  Lord  (1980)  pointed  out  that  the  test  administered  second  is  not  being 
given  under  typical  conditions.  Practice  effects  and  fatigue  may  affect  the 
equating  process.  To  deal  with  this  threat,  the  counterbalanced  random-groups 
design  may  be  employed.  The  single-group  is  divided  into  two  random  half- 
groups.  Both  half-groups  then  take  both  tests  in  counterbalanced  order,  one 


9 
group  taking  the  old  form  first  and  the  other  taking  the  new  form  first  (Petersen 
et  al.,  1989).  Scores  on  both  parallel  forms  are  then  equally  affected  by 
learning,  fatigue,  and  practice. 
Equivalent-groups  designs 

With  single-group  designs,  it  is  also  important  to  administer  both  tests  on 
the  same  day  so  intervening  experiences  do  not  affect  the  results.  However,  it 
is  difficult  in  practice  to  arrange  the  required  time  block.  Equivalent-groups 
designs  are  a  simple  alternative.  The  two  tests  to  be  equated  are  given  to  two 
different  random  groups  from  the  same  population.  However,  differences  in  the 
ability  distributions  of  the  groups  may  introduce  an  unknown  degree  of  bias 
(Hambleton  &  Swaminathan,  1985).  Because  there  are  no  common  data,  it  is 
impossible  to  adjust  for  any  random  differences  (Petersen  et  al.,  1989).  Several 
researchers  have  studied  the  effects  of  these  different  group  ability  distributions 
on  equating  results. 

Harris  and  Kolen  (1986)  investigated  the  effect  of  differences  in  group 
ability  on  the  equating  of  the  American  College  Test  (ACT)  Math  test.  Although 
their  results  showed  score  equivalents  somewhat  higher  for  low-ability  students 
and  lower  equivalent  scores  for  high-ability  examinees,  the  differences  were  not 
significant.  The  authors  concluded  that  the  equatings  were  robust  to  even  large 
differences  in  group  ability  distributions. 

Similar  results  were  found  by  Angoff  and  Cowell  (1986)  when  they 
studied  the  population  independence  of  equating  transformations  using 


10 
Graduate  Record  Examination  (GRE)  data.  Some  minor  discrepancies  were 
discovered,  but  the  majority  were  not  significant  in  horizontal  equating 
situations. 

Cook,  Eignor,  and  Taft  (1988)  hypothesized  that  differences  in  ability 
were  expected  when  the  groups  took  the  two  tests  to  be  equated  at  different 
times  of  the  year.  Two  forms  of  the  Biology  achievement  test  were 
administered.  One  form  was  given  in  the  fall  mainly  to  high  school  seniors,  and 
the  other  form  was  administered  predominantly  to  sophomores  in  the  spring. 
Two  fall  administrations  were  also  equated  and  studied.  Because  recency  of 
instruction  is  important  in  some  parts  of  this  type  of  achievement  test  and  most 
students  study  Biology  in  tenth  grade,  disparate  results  were  attained  from  the 
fall/spring  equating.  The  spring  sample,  containing  mostly  students  who  had 
just  completed  the  subject  tested,  received  higher  scaled  scores  than  the  fall 
sample.  In  this  study,  the  construct  measured  by  the  test  depended  on  the 
sample  of  examinees  to  whom  the  test  was  administered.  In  contrast,  the 
fall/fall  equating  was  robust  to  group  differences.  This  study  demonstrates  the 
importance  of  administering  the  test  forms  to  be  equated  at  the  same  time, 
especially  when  the  content  is  instructionally  sensitive. 
Anchor-test  designs 

Lord  (1980)  stated  the  differences  between  two  samples  of  examinees 
can  be  measured  and  controlled  by  administering  to  each  examinee  an  anchor 
test  measuring  the  same  ability  as  tests  X  and  Y.  When  an  anchor  test  is  used, 


11 

equating  may  be  carried  out  even  when  the  two  groups  are  not  at  the  same  ability 
level.  The  groups  may  be  random  groups  from  the  same  population  or  they  may 
be  nonequivalent  or  naturally  occurring  groups.  The  scores  on  the  anchor  test 
can  be  used  to  estimate  the  performance  of  the  combined  group  (Cook  & 
Petersen,  1987).  The  anchor  test  may  be  an  internal  part  of  both  tests  X  and  Y, 
or  it  may  be  an  external  separate  test.  If  an  external  anchor  test  is  used,  it  should 
be  administered  after  X  or  Y  to  avoid  practice  effects  on  the  tests  to  be  equated 
(Lord,  1980).  The  anchor-test  design,  while  the  most  complicated  of  the  data 
collection  methods,  is  the  most  common  in  real  testing  situations.  Constraints  of 
time  or  available  samples  placed  on  large  testing  programs  often  require  its  use 
(Skaggs  &  Lissitz,  1986a). 

Properties  of  the  anchor  test  can  seriously  affect  the  ensuing  equating 
results.  Klein  and  Jarjoura  (1985)  studied  the  properties  and  characteristics  of 
anchor-test  items  in  relation  to  the  total  test.  A  test  of  250  items  was  equated 
using  three  different  anchor  tests.  Although  all  anchors  were  similar  to  the  total 
test  in  difficulty,  only  one  of  the  anchor-tests  was  representative  of  the  total  test 
content.  The  results  confirmed  the  importance  of  including  items  on  the  anchor 
test  that  mirror  as  nearly  as  possible  the  content  of  the  total  test. 

In  addition  to  content  representativeness,  the  relative  position  of  items  in 
test  books  also  seems  to  play  an  important  role  in  anchor-test  design.  Kingston 
and  Dorans  (1984)  examined  relative  position  effects  of  items  in  a  version  of  the 
GRE  General  Test.  Although  the  equatings  of  the  Verbal  measure  of  the  test 


12 
were  in  close  agreement,  the  Quantitative  and  Analytical  measures  showed 
sensitivity  to  relative  item  position.  When  possible,  it  is  preferable  to  include  the 
anchor  items  spiralled  throughout  the  test  in  their  operational  positions. 

The  length  of  the  anchor  test  is  another  concern  and  the  subject  of  several 
studies.  Klein  and  Kolen  (1985)  used  a  certification  test  to  examine  the 
relationship  between  anchor  test  length  and  accuracy  of  equating  results.  The 
authors  used  anchor  tests  of  varying  lengths  and  examinee  groups  both  similar 
and  dissimilar  in  ability  distribution.  They  concluded  that  when  groups  have 
similar  ability  distributions,  the  anchor  test  length  has  little  effect.  However,  as 
group  ability  distributions  become  more  dissimilar,  longer  anchor  tests  work  best. 
Klein  and  Kolen  also  found  that  anchor  tests  should  correspond  closely  with  the 
total  test  in  content  representation,  difficulty,  and  discrimination. 

The  study  of  Cook  et  al.  (1988)  is  also  pertinent  to  the  question  of  anchor- 
test  length.  When  the  groups  differ  in  level  of  ability,  as  did  the  spring  and  fall 
samples,  different  anchor  test  lengths  yielded  disparate  results.  In  contrast,  when 
the  groups  have  similar  ability  distributions,  like  the  two  fall  samples,  the 
equatings  are  similar  for  different  anchor  test  lengths. 

When  applying  item  response  theory  equating  methods,  anchor  items  are 
usually  referred  to  as  linking  items.  These  linking  items  are  used  to  scale  the 
item  parameter  estimates.  Equating  with  IRT  requires  that  the  item  parameter 
estimates  for  the  two  test  forms  be  on  the  same  scale  before  equating.  The 
quality  of  the  equating  depends  largely  on  how  well  this  item  scaling  is 


13 
accomplished  (Cook  &  Petersen,  1987).  Wingersky  and  Lord  (1984)  studied  the 
problem  of  the  optimal  number  of  linking  items  in  the  context  of  IRT  concurrent 
calibration.  The  authors  concluded  that  two  linking  items  with  small  standard 
errors  of  estimation  worked  almost  as  well  as  a  set  of  25  linking  items  with  large 
standard  errors  of  estimation. 

Wingersky,  Cook,  and  Eignor  (1986)  studied  the  characteristics  of  linking 
items  and  their  effects  on  IRT  equating.  Monte  Carlo  procedures  were  used  with 
parameter  values  set  to  imitate  those  estimated  from  the  Verbal  sections  of  the 
College  Board  Scholastic  Aptitude  Test  (SAT-V).  These  values  were  selected  to 
make  the  simulation  as  realistic  as  possible.  Linking  test  lengths  of  1 0,  20,  and 
40  items  were  used  as  well  as  variations  in  the  size  of  the  standard  errors  of 
estimation  and  distributions  of  examinee  ability.  Scaling  was  accomplished  by 
both  concurrent  calibration  and  characteristic  curve  methods.  The  results  of  this 
study  showed  little  difference  between  the  two  scaling  methods,  and  the  accuracy 
of  the  both  equating  methods  improved  as  the  number  of  linking  items  increased. 
Unlike  the  findings  of  Wingersky  and  Lord  (1984),  linking  items  having  standard 
errors  of  estimation  similar  to  those  found  in  actual  SAT-V  items  provided  slightly 
better  equating  outcomes  than  those  chosen  to  have  small  errors  of  estimation. 

The  studies  reviewed  clearly  indicate  that  the  properties  of  an  anchor  test 
are  of  great  concern.  Anchor  or  linking  items  should  remain  in  the  same  relative 
positions  in  new  and  old  forms  and  as  many  anchor  items  as  possible  should  be 
used  (Cook  &  Eignor,  1988).  The  question  of  optimal  anchor  test  length  becomes 


14 
even  more  important  as  the  ability  distribution  of  the  samples  used  in  equating 
become  more  dissimilar.  Because  anchor  test  designs  are  usually  used  in 
situations  where  ability  distributions  of  the  groups  may  vary  to  an  unknown 
degree,  the  conclusions  have  important  implications.  The  anchor  test  must  also 
closely  mirror  the  total  test  to  be  equated  in  statistical  properties  and  content 
representativeness.  As  the  correlation  between  scores  on  the  anchor  test  and 
the  scores  on  the  new  and  old  forms  becomes  higher,  the  ensuing  equating  also 
improves  (Cook  &  Petersen,  1987). 

Many  factors  may  affect  equating  results.  Because  the  purpose  of 
equating  is  to  create  a  relationship  between  two  tests  so  it  makes  no  difference  to 
the  examinee  which  test  is  administered,  each  of  these  factors  must  be  carefully 
considered  in  deciding  on  the  equating  design.  Some  general  guidelines  to 
successful  equating  are  summarized  in  Table  1 .  Only  after  these  factors  have 
been  carefully  considered  and  the  data  have  been  collected,  can  a  specific 
equating  method  be  chosen. 

Equating  Methods 
Conventional  Methods  of  Equating 

Once  the  data  have  been  collected  using  one  of  the  data  collection 
designs  reviewed,  mathematical  procedures  are  applied  to  the  data  to  develop 
the  equating  transformation.  Many  such  methods  exist,  some  based  on  classical 
test  theory  and  others  on  item  response  theory  (IRT).  The  conventional  methods, 
those  arising  from  classical  test  theory,  may  be  categorized  as  linear  equating  or 
equipercentile  equating. 


15 
Table  1 
Summary  of  Recommendations  for  a  Successful  Equating 


Total  Test 


Well-defined  content  specifications 

Item  selection  based  on  statistical  data  from  field  testing 

Length  of  at  least  35  items 


Examinees 


Sample  size  of  at  least  500 

Better  results  with  groups  similar  in  ability 


Administrative 


Strictly  controlled  testing  conditions 
Security  of  tests  and  items  is  maintained 
Scoring  is  controlled 


Anchor  Tests 


Representative  of  the  total  test  in  difficulty  and  discrimination 

Similar  to  the  total  test  in  content  specifications 

Common  items  are  in  approximately  the  same  position  in  the  old  and 

new  forms. 
Common  items  are  identical  in  both  forms. 
About  20%  -  30%  of  total  test  length 


16 
Linear  equating 

In  horizontal  equating,  the  two  tests  to  be  equated  are  similar  in 
difficulty.  When  administered  to  the  same  group  of  examinees,  the  raw  score 
distributions  are  assumed  to  be  different  only  with  respect  to  the  means  and 
standard  deviations  (Hambleton  &  Swaminathan,  1985).  Linear  equating  is 
based  on  this  assumption.  A  transformation  is  identified  such  that  scores  on  X 
and  Y  are  considered  to  be  equated  if  they  correspond  to  the  same  number  of 
standard  deviations  above  or  below  the  mean  in  some  population.  The  two 
scores  are  equivalent  if 

X_J^=Y^  (1) 

Ox  oy 

These  scores  will  have  the  same  percentile  rank  if  the  distributions  are  the 

same  (Crocker  &  Algina,  1986). 

Many  variations  of  linear  equating  models  exist  whose  details  may  be 

found  in  the  literature  (Angoff,  1971;  Holland  &  Rubin,  1982;  Marco  et  al., 

1983).  Two  of  the  more  commonly  used  models  are  the  Tucker  model  and  the 

Levine  equally  reliable  model.  Both  of  these  procedures  produce  an  equating 

transformation  of  the  form: 

L,(y)=Ay+B  (2) 

where  Lp  (y)  is  the  linear  equating  function  for  equating  Y  to  X  (Dorans,  1990). 
Adaptations  of  this  formula  exist  for  dealing  with  an  anchor  test,  usually 
labelled  V,  when  it  is  or  is  not  part  of  the  reported  score.  The  difference 
between  the  Tucker  model  and  the  Levine  equally  reliable  model  lies  in  their 


17 

underlying  assumptions.  Full  discussions  of  these  assumptions  and 
derivations  of  the  appropriate  formulas  may  be  found  in  Dorans  (1990). 

Many  studies  have  been  conducted  to  assess  the  accuracy  of  linear 
equating  methods.  Skaggs  and  Lissitz  (1986b)  carried  out  a  simulation  study 
with  an  external  anchor  design.  Both  difficulty  and  discrimination  values  were 
manipulated.  The  authors  discovered  unacceptable  results  with  linear 
equating  when  the  discrimination  means  were  unequal  on  the  two  tests. 

Marco,  Petersen,  and  Stewart  (1983)  used  40  different  linear  equating 
models  to  transform  SAT-V  data.  Both  similar  and  dissimilar  samples  were 
used,  as  well  as  variations  of  anchor  test  designs  and  characteristics  of  the 
total  tests.  Some  generalizations  reached  from  the  results  of  this  ambitious 
study  are  as  follows: 

1 .  When  a  test  is  equated  to  a  test  or  form  like  itself  through  a  parallel 
anchor  test  and  the  ability  distributions  of  the  samples  are  identical,  a  linear 
model  yields  very  good  results. 

2.  When  a  test  is  equated  to  a  test  or  form  like  itself  through  an  easy  or 
difficult  anchor  test  with  random  samples,  all  of  the  models  have  a  small  mean 
square  error. 

3.  When  samples  with  dissimilar  ability  distributions  are  used,  linear 
equating  does  not  perform  well. 

4.  When  total  tests  differ  in  difficulty,  linear  models  yield  unsatisfactory 
results. 


18 

Two  methods  of  selecting  samples  and  five  methods  of  equating, 
including  two  linear  methods,  were  combined  in  a  study  by  Livingston,  Dorans, 
and  Wright  (1990).  Again,  when  the  samples  differed  in  ability  distributions  the 
linear  equatings  were  inaccurate,  showing  a  large  negative  bias.  Matching  the 
samples  on  the  basis  of  the  anchor  test  did  little  to  improve  the  results.  The 
authors  recommended  dealing  with  ability  differences  by  selecting  a 
representative  sample  from  each  population  and  choosing  an  equating  method 
that  does  not  assume  exchangeability  for  examinees  based  on  their  anchor 
test  scores. 

Based  on  these  studies,  it  can  be  seen  that  linear  equating  methods  are 
distribution  dependent.  Although  linear  equating  may  perform  satisfactorily  in 
optimal  conditions,  it  is  likely  to  produce  bias  in  real  testing  situations. 
Equipercentile  equating 

In  equipercentile  equating,  a  transformation  is  chosen  so  that  raw 
scores  on  two  tests  are  considered  to  be  equated  if  they  have  the  same 
percentile  rank  (Angoff,  1971).  This  is  based  on  the  definition  that  score 
scales  are  comparable  for  two  tests  if  their  respective  score  distributions  are 
identical  in  shape  for  some  population  (Braun  &  Holland,  1982).  When  this  is 
true,  a  table  of  pairs  of  raw  scores  can  be  constructed.  Because  the  pairs  of 
raw  scores  are  not  necessarily  numerically  equal,  it  is  necessary  to  transform 
one  set  of  scores  into  the  other  set  or  to  convert  both  sets  to  a  new  score 


19 
(Petersen  et  al.,  1989).  In  mathematical  terms,  the  equipercentile  equating 
function  for  equating  Y  to  X  on  population  P  is 

Er(y)=FP,(GP(y))  (3) 

where  GP  (y)  is  the  cumulative  distribution  of  Y  scores  and  FP'1  ( )  is  the 
inverse  of  the  cumulative  distribution  of  X  scores,  Fp  (x).  A  cumulative 
distribution  function  maps  scores  onto  relative  frequencies,  while  an  inverse 
cumulative  distribution  function  maps  the  relative  frequencies  onto  scores 
(Dorans,  1990). 

As  a  mathematical  model,  equipercentile  equating  makes  no 
assumptions  about  the  tests  to  be  equated.  It  simply  compresses  and 
stretches  the  score  units  on  one  test  so  that  its  raw  score  distribution  matches 
the  second  test.  It  is  only  consideration  of  the  purpose  of  equating  and  the 
desired  condition  of  population  invariance  that  prevents  its  application  to  tests 
measuring  different  constructs  (Petersen  et  al.,  1989). 

Generally,  empirical  studies  have  shown  mixed  results  in  assessing  the 
accuracy  of  equipercentile  equating.  Livingston,  Dorans,  and  Wright  (1990) 
included  an  equipercentile  equating  method  in  their  study.  A  composite  of  two 
equipercentile  equatings,  the  procedure  worked  well  in  most  situations. 
Similarly,  the  equipercentile  equating  produced  acceptable  results  in  all 
combinations  of  conditions  in  the  Skaggs  and  Lissitz  (1986b)  study. 

On  the  other  hand,  in  the  investigation  conducted  by  Petersen  et  al. 
(1983)  using  SAT  data,  equipercentile  equating  was  studied  along  with  the 


20 

Tucker  Equally  Reliable  and  Levine  Unequally  Reliable  linear  models  and  three 
IRT  methods.  The  equipercentile  equating  produced  the  worst  results  of  all 
the  methods  investigated.  This  was  especially  true  for  the  Verbal  Test. 

In  a  1983  study  by  Cook  and  Eignor  reported  in  Skaggs  and  Lissitz 
(1986a),  alternate  forms  of  the  biology,  mathematics,  and  social  studies 
achievement  tests  of  the  GRE  were  equated  using  various  procedures.  Again, 
results  varied  by  test  content,  but  the  equipercentile  method  was  inadequate  in 
all  cases.  Cook  and  Eignor  felt  that  equipercentile  equating  may  have  suffered 
from  a  lack  of  data  at  the  extreme  scores. 

The  Cook  et  al.  (1988)  equatings  with  biology  achievement  test  data 
also  uncovered  mixed  results.  Although  the  equipercentile  equating  method 
performed  adequately  with  the  parallel  fall-to-fall  samples,  it  was  not  sufficiently 
robust  to  the  ability  differences  found  in  equating  the  fall  and  spring  samples. 

These  mixed  findings  raise  some  concerns  about  the  application  of 
equipercentile  equating.  When  raw  scores  are  used,  this  method  does  not 
meet  the  conditions  for  equating.  Hambleton  and  Swaminathan  (1985)  noted 
that  a  nonlinear  transformation  is  needed  to  equalize  the  moments  of  the  two 
distributions,  resulting  in  a  nonlinear  relationship  between  the  raw  scores  and 
the  true  scores.  In  turn,  this  implies  that  the  tests  are  not  equally  reliable  and  it 
is  no  longer  a  matter  of  indifference  to  the  examinee  which  form  is  taken. 
Besides  violating  the  equity  condition,  the  equipercentile  equating  process  is 
population  dependent. 


21 

For  the  past  forty  years,  large  scale  testing  programs  publishing  multiple 
forms  of  examinations  have  used  an  equating  process.  Until  recently,  most 
have  employed  one  of  the  conventional  linear  or  equipercentile  procedures 
described.  But  recent  psychometric  developments  have  presented  an 
alternative. 

Equating  Methods  Based  on  Item  Response  Theory 
Item  response  theory 

A  brief  introduction  to  item  response  theory  is  essential  to  an 
understanding  of  the  following  equating  procedures.  Item  response  theory 
(IRT)  is  an  attempt  to  model  an  examinee's  performance  on  a  test  item  as  a 
function  of  the  characteristics  of  the  item  and  the  examinee's  ability  on  some 
unobserved,  or  latent,  trait.  The  IRT  model  specifies  the  relationship  between 
a  latent  trait  and  the  observed  performance  on  items  designed  to  measure  that 
trait. 

This  relationship  can  then  be  depicted  graphically  by  an  item 
characteristic  curve  (ICC).  The  ICC  depicts  the  probability  that  an  examinee  at 
any  given  ability  level  will  make  a  correct  response  to  an  item.  The  graph  is 
typically  an  S-shaped  curve  with  ability,  symbolized  by  9,  plotted  on  the 
horizontal  axis  and  the  probability  of  a  correct  response  to  item  /',  P,  (9),  plotted 
on  the  vertical  axis. 

Many  different  mathematical  models  may  be  used  to  depict  this 
functional  relationship.  Most  common  in  practice  are  the  logistic  class  of 


22 

models  due  to  the  ease  of  estimation.  Birnbaum  (1 968)  proposed  a  two- 
parameter  logistic  model  (2PL)  of  the  form 

Pi(e)  =  [1+eDa(8-b',r1  (4) 

where  b,  is  the  difficulty  value,  a,  is  the  discrimination  parameter,  and  D  is  a 
scaling  factor,  normally  1.7. 

The  three-parameter  logistic  model  (3PL)  adds  a  third  parameter, 
denoted  c„  referred  to  as  the  lower  asymptote.  The  mathematical  form  of  the 
3PL  model  is  written  as 

Pi(e)  =  Ci+(1-Q)[1  +  e-D*(6-w]',  (5) 

with  the  a, ,  b, ,  and  D  defined  as  before.  The  value  of  c,  is  typically  smaller 
than  the  value  that  would  result  if  examinees  were  to  make  a  random  response 
to  the  item  (Hambleton  &  Swaminathan,  1985).  Figure  1  depicts  an  ICC  based 
on  the  3PL  model. 

The  one-parameter  logistic  model,  or  Rasch  model,  assumes  all  items 
have  equal  discrimination  and  no  guessing  occurs.  This  model  is  written 

P,(e)  =  [1  +  eD<e-l«»r1  (6) 

where  the  parameters  are  defined  as  in  the  previous  models. 

Cursory  examination  of  the  three  IRT  logistic  models  may  lead  to  the 
conclusion  that  they  form  a  type  of  hierarchy  from  least  to  most  specific. 
However,  the  three  models  represent  very  different  philosophical  perspectives 


23 


of  measurement  theory  (Skaggs  &  Lissitz,  1986a).  It  is  these  differences  that 
must  be  considered  when  selecting  a  model  for  a  particular  application. 


I.O 

^__ 

m   08 

/ 

a 

£      0.8 

CD 

< 

03      0.4 

O 

or 

Q. 

..                                   /—~  a,  (slope) 

./ 

08 

-4 1 1 , Lh 

b,  (slope  is 
/^    maximized) 

/.,..!         i          1         i         1 

Figure  1 .  An  item  characteristic  curve  (ICC)  based  on  the  three-parameter 
logistic  model 


The  use  of  any  of  the  IRT  models  entails  restrictive  assumptions  about 
the  item  response  process.  Briefly  stated,  the  major  assumptions  of  IRT  are 
as  follows: 

1.  The  ICC  accurately  represents  the  data. 

2.  The  data  are  unidimensional. 

3.  Responses  are  locally  independent  (Skaggs  &  Lissitz,  1986a) 
An  ICC  is  defined  completely  when  its  general  form  is  specified  and 

when  the  parameters  of  a  particular  item  are  known  (Hambleton  & 
Swaminathan,  1985).  This  leads  to  the  basic  advantage  of  IRT  models.  When 
the  data  fit  the  model  reasonably  well,  it  is  possible  to  demonstrate  the 


24 

invariance  of  item  and  ability  parameters.  When  the  item  parameters  are 
known,  an  examinee's  ability  may  be  estimated  from  any  subset  of  the  items. 
Also,  item  parameters  may  be  calibrated  with  any  sample  drawn  from  a 
sufficiently  large  population  (Skaggs  &  Lissitz,  1986a).  These  advantages 
cannot  be  derived  from  classical  test  theory  and  should  have  tremendous 
consequences  for  equating  with  item  response  theory. 

All  of  the  practical  IRT  models  are  based  on  the  unidimensionality 
assumption.  This  states  that  the  probability  of  a  correct  response  by 
examinees  to  a  set  of  items  can  be  mathematically  modeled  by  using  only  one 
ability  parameter  (Kingston  &  Dorans,  1984).  According  to  Lord  (1980),  while 
ability  is  probably  not  normally  distributed  for  most  groups  of  examinees, 
unidimensionality  is  a  property  of  the  items  and  does  not  cease  to  exist 
because  the  examinee  group  is  changed  in  distribution. 

Because  the  items  on  a  test  are  assumed  to  measure  only  one  common 
trait,  for  all  examinees  with  the  same  ability  the  item  responses  are 
independent  of  one  another.  This  is  the  local  independence  assumption.  The 
probability  of  success  on  any  given  item  depends  on  the  item  parameters, 
examinee  ability,  and  nothing  else.  In  determining  the  probability  of  a  correct 
response  to  a  specific  item,  success  or  failure  on  other  items  will  add  no  new 
information  if  ability  is  known  (Lord,  1980). 

Good  estimation  of  the  item  and  ability  parameters  is  of  paramount 
importance  in  describing  the  data  accurately.  Many  investigators  have 


25 

explored  the  effect  of  the  number  of  items  and  the  number  of  examinees  on 
parameter  estimation  for  IRT  models.  The  results  of  these  studies  varied 
according  to  the  estimation  procedure  used.  Available  estimation  methods 
include  (a)  joint  maximum  likelihood  estimation  (JML),  (b)  conditional  maximum 
likelihood  estimation  (CIVIL),  (c)  marginal  maximum  likelihood  estimation 
(MML),  and  (d)  Bayesian  estimation  (BE).  Full  explanations  of  the  various 
procedures  may  be  found  in  Hambleton  and  Swaminathan  (1985). 

Much  of  the  research  on  parameter  estimation  employed  the  JML 
procedure  as  implemented  by  the  computer  program  LOGIST  (Wood, 
Wingersky,  &  Lord,  1976).  These  reports  will  not  be  reviewed  here,  but  the 
interested  reader  is  referred  to  Harrison  (1986),  Hulin  et  al.  (1982),  Lord 
(1968),  Ree  (1979),  Swaminathan  and  Gifford  (1983,  1985),  and  Wingersky 
and  Lord  (1984).  In  general,  a  sample  size  of  at  least  1,000  and  test  length  of 
50  or  more  items  is  required  for  acceptable  estimation  with  the  JML  procedure 
of  LOGIST.  One  major  problem  uncovered  by  these  studies  is  that  consistent 
estimates  of  the  item  parameters  cannot  be  obtained  in  the  presence  of 
examinee  (9)  parameters  because  the  latter  increase  with  sample  size  (Baker, 
1990). 

This  problem  can  be  overcome  by  using  the  MML  procedure 
implemented  in  the  BILOG  computer  program  (Mislevy  &  Bock,  1987).  The 
examinee's  6  parameters  are  removed  from  item  parameter  estimation  by 
integrating  them  over  an  assumed  unit  normal  prior  distribution.  At  this  point  in 


26 
the  procedure,  it  is  not  the  e  of  each  examinee  that  has  been  estimated,  but 
the  form  of  the  9  distribution.  The  item  parameters  are  first  estimated,  followed 
by  the  6  parameters  at  a  later  stage  (Baker,  1990). 

In  addition  to  MML,  the  BILOG  program  allows  Bayesian  maximum  a 
posteriori  estimation  (MAP)  and  Bayesian  expected  a  posteriori  estimation 
(EAP)  of  6  parameters.  Mislevy  and  Stocking  (1989)  have  recommended  the 
EAP  procedure  with  a  unit  normal  prior  for  the  9  distribution.  Specifying  this 
prior  for  abilities  limits  extreme  values  of  the  9  estimates  and  the  resulting 
variances  will  tend  to  be  smaller  than  with  MML.  When  the  value  of  the 
variance  is  smaller,  the  prior  distribution  becomes  more  concentrated  and  pulls 
the  estimated  parameters  toward  the  mean  of  the  distribution. 

Yen  (1987)  compared  LOGIST  and  BILOG  for  accuracy  of  item 
parameter  estimation.  Test  lengths  of  10,  20,  and  40  items  were  simulated 
with  a  sample  of  1,000  examinees.  The  ability  distributions  examined  were 
normal,  positively  skewed,  negatively  skewed,  and  symmetric.  Item  difficulty 
was  also  manipulated.  The  BILOG  estimates  were  more  accurate  than  those 
of  LOGIST  in  almost  every  situation.  The  advantage  of  BILOG  was  even  more 
pronounced  for  the  small  item  set.  Although  ability  distribution  had  no 
substantial  effect  on  the  estimation  of  the  ICCs,  discrimination  and  pseudo- 
chance  parameters  were  somewhat  inaccurate  with  BILOG  in  the  case  of  the 
negatively  skewed  distribution. 


27 
In  addition  to  investigating  the  effect  test  length  had  on  item  and  ability 
parameter  estimates  derived  from  LOGIST  and  BILOG  procedures,  Quails  and 
Ansley  (1985)  studied  the  sample  size  effect.  Sample  sizes  of  200,  500,  and 
1,000  examinees  with  a  normal  ability  distribution  were  combined  with  test 
lengths  of  10,  20,  and  30  items.  As  sample  size  increased,  both  procedures 
produced  estimates  more  highly  correlated  with  the  simulated  values.  The 
BILOG  estimates  were  slightly  better  in  all  cases  and  superior  in  the 
combination  of  small  sample  size  with  10  items. 

Buhr  and  Algina  (1986)  used  BILOG  with  four  methods  of  estimation 
and  sample  sizes  of  250,  500,  750,  and  1 ,000  to  study  the  similarity  of 
estimation.  The  Bayesian  procedures  were  the  most  robust  in  dealing  with 
different  ability  distributions.  Estimation  with  all  procedures  improved 
substantially  as  sample  size  increased  to  500,  but  showed  little  additional 
effect  as  sample  size  increased  further. 

Baker  (1990)  simulated  item  response  data  based  on  a  45-item  test  with 
500  examinees  to  study  the  pattern  of  estimation  results  as  a  function  of  the 
various  analysis  operations.  The  data  were  analyzed  under  the  options 
available  in  BILOG  and  the  obtained  parameter  estimates  were  equated  back 
to  the  true  metric.  The  equated  results  were  generally  very  close  to  the  true 
parameters.  The  item  parameters  were  only  slightly  affected  by  the 
characteristics  of  various  priors.  The  equated  means  of  the  estimated  6s  were 


28 

somewhat  higher  than  the  true  values,  both  when  priors  were  and  were  not 
imposed  on  the  item  discriminations. 
IRT  equating 

Nothing  in  IRT  contradicts  the  basic  conclusions  of  classical  test  theory. 
Additional  assumptions  are  made  that  allow  answers  not  available  under 
classical  test  theory  (Lord,  1980).  The  theoretical  advantage  of  IRT  models  is 
that  once  a  set  of  items  have  been  fitted  to  an  IRT  model,  it  is  possible  to 
estimate  the  ability  of  examinees  who  have  taken  a  different  set  of  items.  To 
accomplish  this,  the  items  must  be  measuring  the  same  latent  trait  and  must 
be  on  the  same  scale  (Petersen  et  al.,  1989).  When  this  is  true  and  the  item 
parameters  are  known,  it  will  make  no  difference  to  the  examinee  what  subset 
of  items  is  administered.  Therefore,  in  the  context  of  IRT,  equating  is  not 
necessary  (Hambleton  &  Swaminathan,  1985). 

However,  when  both  item  and  ability  parameters  are  unknown,  it  is 
necessary  to  choose  an  arbitrary  metric  for  either  the  ability  parameter  6  or  the 
item  difficulty  b,.  .  Because  all  the  models  for  P,(9)  are  functions  of  the 
quantity  a,  (9  -  b,),  the  same  constant  may  be  added  to  every  6  and  b,  without 
changing  the  item  response  function  P,  (6).  Additionally,  every  9  and  b,  may 
be  multiplied  by  a  constant  and  every  a,  divided  by  the  same  constant  without 
changing  the  quantities  a, (9  -  b,)  and  P,(9).  Therefore,  the  origin  and  unit  of 
measurement  of  the  ability  scale  are  arbitrary  and  any  scale  for  9  may  be 


29 

chosen  as  long  as  the  same  scale  is  chosen  for  b,  (Petersen  et  al.,  1989). 
This  is  referred  to  as  indeterminacy  of  the  parameter  scale. 

If  the  parameters  of  a  set  of  items  are  estimated  separately  for  two 
different  groups  of  examinees,  the  item  parameters  may  appear  to  be  different 
due  to  the  arbitrary  fixing  of  the  metric  for  9  or  b,. .  However,  the  two  sets  of  9s 
and  b,  s  should  have  a  linear  relationship  to  each  other  (Hambleton  & 
Swaminathan,  1985).  The  a*  s  should  be  the  same  except  for  differences  in 
unit  of  measurement  and,  in  the  3PL  case,  the  c,  s  remain  unaffected 
(Petersen  etal.,  1989). 

The  advantages  of  IRT  equating  are  most  useful  in  the  case  where 
groups  taking  the  two  tests  are  nonrandom  or  intact  groups  (Crocker  &  Algina, 
1986).  Consequently,  the  following  discussion  will  emphasize  uses  of  IRT 
equating  with  an  anchor  test  design.  However,  item  response  theory 
procedures  may  also  be  used  with  single-group  or  equivalent  groups  designs. 

An  anchor  or  linking  test  is  one  method  available  to  put  the  parameters 
for  the  two  tests  on  the  same  scale.  Four  procedures  commonly  used  with  this 
method  are  (a)  concurrent  calibration,  (b)  the  fixed  bs  method,  (c)  the  equated 
bs  method,  and  (d)  the  characteristic  curve  transformation  method. 

In  concurrent  calibration,  parameters  for  the  two  tests  are  estimated 
simultaneously.  The  linking  items,  or  sometimes  common  subjects,  serve  to 
unite  the  two  tests  and  results  in  item  parameter  estimates  on  a  common 
scale.  This  allows  direct  equating  of  the  two  tests  (Petersen  et  al.,  1989). 


30 

The  parameters  of  each  total  test-anchor  test  combination  are 
estimated  sequentially  in  the  fixed  bs  method.  After  the  item  parameters  have 
been  estimated  for  one  test,  the  item  difficulties  of  the  linking  items  obtained 
from  the  first  calibration  are  used  as  input  for  the  estimation  of  parameters  on 
the  second  test.  The  linking  item  parameters  are  not  reestimated.  The  end 
result  is  item  parameters  for  both  tests  being  placed  on  the  same  scale 
(Petersen,  Cook,  &  Stocking,  1983). 

In  the  equated  bs  method,  the  parameters  for  each  test  are  estimated 
separately.  Then  the  means  and  standard  deviations  of  the  difficulties  for  the 
two  sets  of  linking  items  are  set  to  be  equal.  Ability  estimates  could  also  be 
used  for  this  purpose.  This  linear  transformation  is  then  applied  to  the  a> ,  b, , 
and  e  parameters  of  the  second  test  (Petersen  et  al.,  1989).  Several 
variations  of  the  transformation,  including  the  mean  and  sigma  method  and  the 
robust  mean  and  sigma  method,  are  described  in  Hambleton  and 
Swaminathan  (1985).  Also,  Stocking  and  Lord  (1983)  described  a  modification 
which  gives  lower  weights  to  poorly  estimated  parameters  and  outliers. 

It  is  most  common  in  both  the  fixed  bs  and  equated  bs  methods  to  use 
only  the  relationship  for  item  difficulties  to  obtain  the  equating  function 
(Hambleton  &  Swaminathan,  1985).  The  characteristic  curve  method  can 
prevent  the  possible  loss  of  information  caused  by  ignoring  the  discrimination 
relationship.  For  the  characteristic  curve  method,  the  parameters  of  each  test 
are  calibrated  separately.  All  parameters  are  then  placed  on  the  same  scale 


31 
by  using  the  two  sets  of  parameter  estimates  from  the  common  items.  A  linear 
transformation  is  obtained  from  minimizing  the  difference  between  the  true 
scores  on  the  linking  items.  This  transformation  is  then  applied  to  the  a, ,  b, , 
and  8  parameters  of  the  second  test  (Stocking  &  Lord,  1983).  Because  it 
takes  all  information  into  account,  this  procedure  is  theoretically  an 
improvement  over  the  previous  methods. 

Sometimes  the  reporting  of  abilities  in  terms  of  6  is  unacceptable.  In 
these  situations,  the  9  value  from  a  test  may  be  converted  to  its  corresponding 
true  score  \  through 

^  =  1*0)  (7) 

M 

where  n  is  the  number  of  items  on  the  test.  Equating  of  the  true  scores  on  the 
two  tests  is  then  possible  (Hambleton  &  Swaminathan,  1985).  The  true  score 
on  one  test  is  said  to  be  equated  to  the  true  score  on  a  second  test  if  each 
corresponds  to  the  same  ability  level,  or  if 

5  =  £P,(9)   ,    ti  =  £pj<e)  (8) 

(Skaggs  &  Lissitz,  1986a).  In  practice,  estimated  item  parameters  are  used  to 
approximate  P,  (6)  and  P,  (6).  Paired  values  of  \  and  ti  are  then  computed  by 
substituting  a  series  of  arbitrary  values  for  6  into  Equation  8  and  calculating  $ 
and  r|  for  each  8.  These  paired  values  define  5  as  a  function  of  r\  and 
constitute  an  equating  of  these  true  scores  (Lord,  1980). 


32 

The  relationship  between  raw  scores  and  true  scores  on  two  tests  is  not 
necessarily  the  same,  nor  is  an  equating  provided  for  individuals  scoring  below 
the  chance  level  (Petersen  et  al.,  1989).  Observed-score  equating  provides  a 
method  of  predicting  the  raw-score  distribution  of  a  test.  This  procedure  uses 
probabilities  of  correct  responses  under  an  IRT  model  to  generate  a 
hypothetical  joint  distribution  of  item  responses  from  all  examinees  taking  both 
tests.  Conventional  equipercentile  equating  is  then  applied  to  the  new 
distributions  (Skaggs  &  Lissitz,  1986a).  Neither  true-score  nor  observed-score 
equating  is  applied  often  in  practice.  Both  are  complicated  to  calculate  and 
expensive  to  implement. 

Many  researchers  have  investigated  the  accuracy  of  IRT  equating 
methods  using  the  various  IRT  models  and  procedures.  Comparison  of  IRT 
equating  with  conventional  methods  is  also  common.  Marco,  Petersen,  and 
Stewart  (1983)  examined  the  Rasch  and  3PL  models  along  with  the  40  linear 
and  two  equipercentile  equating  methods  previously  discussed.  A  variety  of 
conditions,  including  random  and  dissimilar  samples,  internal  and  external 
anchors,  and  difficulty  levels  of  the  anchor  tests  were  also  studied.  The  two 
IRT  methods  worked  well,  both  with  an  external  anchor  test  equal  in  difficulty 
to  the  total  test  and  with  an  internal  anchor.  With  the  external  anchor  test,  the 
Rasch  results  were  slightly  better  than  with  any  of  the  other  equating  methods 
investigated.  Both  IRT  models  were  clearly  superior  to  the  conventional 
equating  methods  when  the  samples  differed  in  ability  distributions,  but  neither 


33 
the  Rasch  nor  the  3PL  model  showed  superiority  to  the  other  under  the 
conditions  studied. 

Kolen  (1981)  explored  true-score  and  observed-score  equating  methods 
as  well  as  a  linear  and  an  equipercentile  equating  method.  The  Rasch,  2PL, 
and  3PL  models  were  used  for  the  IRT  equatings.  The  two  forms  of  the  Iowa 
Test  of  Educational  Development  to  be  equated  had  no  common  items.  Each 
test  had  been  administered  to  a  random  sample.  The  true-score  method  for 
the  3PL  model  produced  the  best  results.  When  only  quantitative  items  were 
equated,  the  Rasch  true-score  combination  also  worked  well. 

Kolen  and  Whitney  (1982)  used  the  General  Educational  Development 
Tests  (GED)  with  the  Rasch,  2PL,and  3PL  IRT  models  and  an  equipercentile 
equating  method.  They  found  with  small  samples  (N  <  198)  a  number  of 
extreme  item  parameter  estimates  were  produced  by  the  3PL  model  which 
seriously  affected  the  equating. 

In  the  Petersen,  Cook,  and  Stocking  (1983)  study  discussed  earlier  in 
the  context  of  conventional  equating,  a  3PL  model  was  also  examined  using 
concurrent  calibration,  the  fixed  bs  method,  and  the  characteristic  curve 
transformation.  For  the  SAT-V,  all  IRT  models  and  methods  outperformed 
linear  and  equipercentile  equatings.  Both  conventional  and  IRT  methods 
yielded  acceptable  results  for  the  mathematics  test.  Concurrent  calibration 
with  the  3PL  model  produced  the  least  amount  of  error. 


34 

Harris  and  Kolen  (1985)  compared  conventional  equating  methods  with 
IRT  3PL  model  equating.  The  sample  consisted  of  high  and  low  ability 
examinees.  The  3PL  model  was  found  to  be  slightly  superior. 

The  Cook,  Eignor,  and  Taft  (1988)  study  using  biology  achievement 
tests  administered  at  different  points  in  time  included  a  3PL  model  with  the 
characteristic  curve  transformation  in  addition  to  the  equipercentile  equating 
method.  The  authors  concluded  that  the  IRT  results,  although  slightly  superior 
with  the  fall-to-spring  sample  equating,  basically  paralleled  the  results  obtained 
with  the  conventional  method. 

A  minimum-competency  test,  Florida's  Statewide  Student  Assessment 
Test,  Part  II  (SSAT-II)  was  equated  by  Hills,  Subhiyah,  and  Hirsch  (1988). 
Their  purpose  was  to  study  the  effect  of  anchor  length  on  equating  and 
compare  different  equating  methods  using  a  sample  with  a  negatively  skewed 
distribution.  The  equating  methods  investigated  were  linear,  Rasch,  and  3PL. 
The  IRT  models  were  equated  with  concurrent  calibration,  fixed  bs  method, 
and  equated  bs  method  using  robust  mean  and  sigma.  The  authors  concluded 
that  the  3PL  model  with  concurrent  calibration  and  Rasch  models  gave  similar 
good  results.  Also,  when  using  the  3PL  model  with  concurrent  calibration,  an 
anchor  test  length  of  10  items  was  found  to  be  sufficient  for  good  equating 
outcomes. 

Results  of  these  studies  indicate  that  the  3PL  model  tends  to  perform 
better  than  conventional  and  Rasch  equating  in  a  variety  of  situations. 


35 

Equating  with  IRT  appears  to  produce  better  results  than  conventional 
equating  methods,  especially  when  the  ability  distribution  of  the  two  groups  is 
dissimilar.  Concurrent  calibration  and  characteristic  curve  transformation  were 
the  preferred  methods  of  scaling,  although  fewer  linking  items  are  required  with 
concurrent  calibration.  Table  2  contains  a  summary  of  the  equating  studies 
reviewed  here. 

Multidimensionalitv 
Violation  of  the  Unidimensionalitv  Assumption 

The  mathematical  models  upon  which  IRT  is  based  are  grounded  on 
very  strong  assumptions,  particularly  that  item  responses  are  unidimensional 
(Ansley  &  Forsyth,  1985).  The  unidimensionality  assumption  requires  that 
each  of  the  tests  to  be  equated  onto  a  common  scale  must  measure  the  same 
underlying  trait  or  ability.  Any  factor  that  influences  an  examinee's  score,  other 
than  the  one  assumed  latent  trait,  will  violate  the  unidimensionality  assumption. 
Although  IRT  explicitly  acknowledges  this  assumption,  other  commonly  used 
procedures  that  transform  scores,  such  as  equipercentile  equating,  are  also 
unidimensional  even  if  not  stated  specifically  (Hirsch,  1989).  This  can  be  seen 
by  reviewing  the  required  conditions  for  equating. 

There  are  many  factors  that  may  cause  multidimensionality,  such  as 
guessing,  speededness,  fatigue,  cheating,  random  answering,  instructional 
sensitivity,  or  item  context  and  content.  Two  or  more  cognitive  traits  may 
influence  an  examinee's  response  to  an  item.  For  example,  reading 


36 


»     0> 

<0    » 

g  models 
ely  skewe 
length 
models 

■ 

5- J 

■a  E 

I 

CD    o 
CD    CD 

O    CO 

E  « 

O)  CD 

E   o) 
0)c 

11 

fl 

fl  i  °  s 

§9 

,«    ^ 

ro    -7Z. 

CO     CO    _C!   ■- 

<o  — 

-j  en 
CT«> 

equ 
neg 
anc 
seal 

3  E 

"D    CD 

<D  "D 

ct)  .tr 

o 


lis 

JD    X    CO 

;cj0 

ill 

CO  .£  T3 


a. 
»S 

■  J2 

II 

o>  E 

<5i 
ctS 

0)    to 


0) 

o 
5 


o       S 


1 
a 

D_  c       Q.  ct 

CO  =  CO    Q) 


r  5 


CO  — 


-f'c 
Q-  m 
co   o 


K  CO 

o  .9- 


£  5 


£    CT 

co  a> 


CD 
O 


£  £■ 

c 

8 

CO 

5 

h- 

| 

CO    CO 

53 

.    n 
a  ft 

O 

co 

m    CT 

< 

CO 

O    o 

CT» 

Llj 

§>S 

5 

.c  E. 

2  ^r 

"8  ~ 

88 

"8 

o  ^ 

CJ  H 

I  C- 

I   <0 

CD 

C 


C  o3 

C  C  CN 

©  CD  CO 


Cm 

|§ 

£  ■ 

8*1 

5  55 


OS 


6? 


a>   ° 
a.  co 


37 
skill  may  be  required  to  correctly  answer  a  mathematical  item.  Some  of  these 
violations  can  be  controlled,  reduced,  or  eliminated,  but  the  unidimensionality 
assumption  will  still  be  violated  in  many  practical  situations  (Doody-Bogan  & 
Yen,  1983).  Achievement  tests  are  not  constructed  using  methods  that  yield 
factor  pure  instruments.  Instead,  a  table  of  specifications  is  customarily 
developed  and  items  are  written  to  match  the  specifications.  These  items 
rarely  measure  a  single  trait  (Reckase,  1979).  Due  to  the  many  possible 
causes  leading  to  violation  of  the  unidimensionality  assumption,  it  can  be 
concluded  that  dimensionality  is  a  joint  property  of  both  the  item  set  and  the 
particular  sample  of  examinees  (Hattie,  1985). 
Multidimensional  Models 

Recently,  attempts  have  been  made  to  model  multidimensional 
responses  within  the  framework  of  IRT.  Several  multidimensional  item 
response  theory  (MIRT)  models  have  been  proposed.  Although 
multidimensional  versions  of  all  three  logistic  parameter  IRT  models  have  been 
derived,  only  the  multidimensional  two-parameter  logistic  (M2PL)  model  will  be 
discussed. 

Doody-Bogan  and  Yen  (1983)  described  a  multidimensional  model  of 
the  form 


Pii(ah)  = ^— (9) 


1  +  exp[-DXajh(fth-bih)] 


38 

where  6a  is  the  ability  parameter  for  person  i  for  dimension  h;  aih  is  the 
discrimination  parameter  for  item  j  for  dimension  h;  byy,  is  the  difficulty 
parameter  for  item  j  for  dimension  h;  and  D  is  the  scaling  constant,  1 .7. 
Another  model  discussed  by  Sympson  (1978)  is  defined 

Wj do) 

flO  +  expt-Da^Qh-b,,,]]) 

h=1 

where  all  parameters  are  defined  as  above. 

These  two  models  can  be  distinguished  by  comparing  their 
denominators.  The  Doody-Bogan  and  Yen  model  contains  no  product  of 
probabilities  in  the  denominator  as  does  the  Sympson  model.  Equation  9  can 
be  classified  as  a  compensatory  model  that  permits  high  ability  on  one 
dimension  to  compensate  for  low  ability  on  another  dimension  in  terms  of  the 
probability  of  a  correct  response    If  dimensionality  is  considered  in  the  context 
of  factor  analysis,  a  two-dimensional  test  has  a  group  of  items  measuring  each 
dimension.  A  compensatory  model  seems  reasonable  because  the  test  is 
being  considered  as  a  whole  (Ansley  and  Forsyth,  1985). 

The  second  model,  defined  by  Equation  10,  is  called  a 
noncompensatory  model  where  high  abilities  on  one  factor  are  not  allowed  to 
supplement  low  abilities  on  the  second  factor.  When  a  two-dimensional  test  is 
considered  as  one  that  requires  simultaneous  application  of  the  two  abilities  to 
answer  each  item  correctly,  the  noncompensatory  model  seems  more 
appropriate  (Ansley  and  Forsyth,  1985). 


39 

Reckase  (1985)  has  alternately  defined  the  compensatory  M2PL  to 
provide  a  simple  framework  for  specifying  and  generating  multidimensional 
item  response  data.  This  model  defines  the  probability  of  a  correct  response 
as 

EXP(aj,  6,  +  dj) 

P(*<  ■  ***>  - 1  +  exp(1;  a  *  d))         (11) 

where  aj  is  a  vector  of  discrimination  parameters;  dj  is  related  to  item  difficulty; 
and_9j  is  a  vector  of  ability  parameters.  The  exponent  can  also  be  written  as 

m 

X>ih(fth   -   bjh)  (12) 

where  m  is  the  number  of  dimensions;  ajh  is  an  element  of  §4;  On,  is  an  element 
of  9j;  and  d,  =  -Sajhbjh.  When  this  form  is  used,  the  relationship  to  the  more 
familiar  expression  in  Equation  9  can  be  seen. 

The  data  described  by  a  multidimensional  IRT  model  can  be  depicted 
graphically  by  an  item  response  surface  (IRS).  Figure  2  presents  an  IRS  for 
an  M2PL  item.  The  IRS  increases  monotonically  as  the  elements  of  6, 
increase  (Reckase,  1985). 

To  identify  the  multidimensional  item  difficulty  (MID)  for  an  item,  the 
point  in  the  IRS  where  the  item  is  most  discriminating  must  be  found.  This 
point,  which  provides  the  maximum  information  about  an  examinee,  will  have 
the  greatest  slope.  Because  the  slope  along  the  IRS  can  differ  according  to 


40 


the  direction  taken,  Reckase  (1985)  determined  the  slope  using  the  direction 
from  the  origin  of  the  9  space  to  the  point  of  highest  discrimination. 


Figure  2.  An  item  response  surface  (IRS)  based  on  the  compensatory  M2PL 


To  accomplish  this  analysis,  the  model  given  in  equation  1 1  is 
translated  to  polar  coordinates,  replacing  each  eih  by  6i  cos  a*,  where  Q,  is  the 
distance  from  the  origin  to  6,  and  aih  is  the  angle  from  the  hm  axis  to  the 
maximum  information  point  (Reckase,  1985).  In  a  two-dimensional  item,  the 
value  of  otih  can  range  between  0°  and  90°  depending  on  the  degree  to  which 
the  item  measures  the  two  traits.  If  the  item  only  measures  the  first  trait,  an 
equals  0°,  while  0,1  =  90°  would  depict  an  item  measuring  only  the  second  trait. 
The  relationship  between  an,  and  discrimination  element  aih  can  then  be  stated 
as 


41 


°°S  «*    '      i,**       ■  (13> 


S(aJ2 
The  MID  parameters  can  now  be  expressed  as 

MID<  =  7==    •  d4) 

X(a„)2 
|w 

Finally,  an  item  that  requires  two  abilities  for  a  correct  response  can  be 
represented  as  a  vector  in  the  two-dimensional  latent  ability  space.  The  length 
of  the  vector  for  an  item  is  equal  to  the  degree  of  multidimensional 
discrimination  (MDISC)  (Ackerman,  1991).  Reckase  (1985)  expressed  MDISC 
as 


MDISC,   =  Jg(a^  =   ^|     .  (15) 

These  equations  provide  an  excellent  framework  for  manipulating  conditions 
during  generation  of  multidimensional  data. 

Many  indices  have  been  developed  to  assess  the  dimensionality  of  a 
test  and  test  items.  Hattie  (1985)  examined  over  30  of  these  indices  which 
were  grouped  into  methods  based  on  (a)  answer  patterns,  (b)  reliability,  (c) 
principal  components,  (d)  factor  analysis,  and  (e)  latent  traits.  Hattie 
concluded  that  none  of  the  indices  were  satisfactory  and  only  four  could  even 


42 

distinguish  unidimensional  from  multidimensional  data  sets.  A  major  problem 
encountered  by  Hattie  in  assessing  the  indices  was  that  unidimensionality  was 
often  confused  with  reliability,  internal  consistency,  and  homogeneity. 

More  recently,  other  procedures  have  been  developed  to  assess  the 
dimensionality  of  latent  traits.  Roznowski,  Tucker,  and  Humphreys  (1991) 
explored  several  of  these  indices.  Procedures  based  on  the  shape  of  the 
curve  of  successive  eigenvalues  were  found  to  be  unsatisfactory  under  most 
conditions.  A  pattern  index  of  second  factor  loadings  was  accurate  except  with 
high  obliqueness.  The  most  accurate  index  in  this  study  was  based  on  local 
independence.  The  use  of  this  index  is  particularly  recommended  with  large 
samples  and  many  items. 

Linear  factor  analysis  has  been  widely  used  to  assess  dimensionality  of 
dichotomous  items.  However,  use  of  phi  correlations  often  leads  to 
overestimation  of  the  number  of  factors  underlying  the  responses  by 
confounding  factor  coefficients  with  item  difficulties  (Bock,  Gibbons,  &  Muraki, 
1988;  Hambleton  &  Swaminathan,  1985).  Tetrachoric  correlations  may  be 
substituted,  but  may  still  be  confounded  with  item  difficulty  or  guessing  in  real 
data  (Camilli,  1992).  Bock,  Gibbons,  and  Muraki  (1988)  have  developed  a 
maximum  likelihood  full  information  factor  analysis  procedure  as  an  attempt  to 
deal  with  these  problems. 

Another  approach  to  dimensionality  taken  by  Stout  (1990)  replaced  the 
strong  assumptions  of  unidimensionality  and  local  independence  with  less 


43 

restrictive  assumptions  of  essential  unidimensionality  and  essential 
independence.  Stout  contended  that  a  dominant  dimension  results  when  an 
attribute  overlaps  many  items  and  other  dimensions  common  to  only  a  few 
items  are  unavoidable  in  reality,  but  are  also  not  significant.  These  minor 
dimensions  are  rarely  discussed  in  IRT  literature,  but  are  a  frequent  theme  in 
classical  factor  analysis.  While  the  IRT  definition  of  dimensionality  would  take 
all  factors,  major  and  minor,  into  account,  essential  dimensionality  is  a 
mathematical  conceptualization  of  the  number  of  dominant  dimensions  with 
minor  dimensions  ignored.  An  essentially  unidimensional  test  is  therefore  any 
set  of  items  selected  from  an  infinite  item  pool  that  measures  exactly  one 
major  dimension.  When  essential  unidimensionality  is  assumed,  latent  ability 
is  unique  in  an  ordinal  scaling  sense  and  this  unique  latent  ability  is  estimated 
consistently.  Stout  presented  theorems  and  proofs  to  show  that  dimensions 
distributed  nondensely  over  items  or  dimensions  that  have  a  minor  influence 
on  possibly  many  items  do  not  necessarily  negate  essential  unidimensionality. 
He  continued  to  present  guidelines  for  development  of  essentially 
unidimensional  tests.  Among  the  recommendations  are  limiting  the  number  of 
abilities  per  item;  keeping  the  number  of  items  dependent  on  the  same  ability, 
other  than  the  intended-to-be-measured  6,  small;  and  controlling  the  number  of 
item  pairs  assigned  to  the  same  ability  other  than  9.  These  conditions  are 
usually  met  with  the  carefully  designed  tests  usually  found  in  practice. 


Nandakumar  (1991)  used  simulations  to  investigate  Stout's  statistical 
test  of  essential  unidimensionality.  When  one  dominant  trait  and  one  or  more 
minor  dimensions  having  little  influence  on  item  scores  were  present,  Stout's 
test  performed  well  in  indicating  essential  unidimensionality.  The  test  is  more 
likely  to  reject  the  hypothesis  of  essential  unidimensionality  as  the  effect  of  the 
minor  dimensions  increases. 

To  facilitate  application  of  the  test  of  essential  unidimensionality,  Stout 
developed  the  computer  program  DIMTEST.  An  investigation  of  the  program 
revealed  problems  when  a  test  consisted  of  difficult,  highly  discriminating  items 
where  guessing  was  also  present  (Nandakumar  &  Stout,  1993).  Refinements 
were  subsequently  made  to  the  program  to  make  it  more  robust  and  beneficial 
to  the  measurement  practitioner. 

Nandakumar  (1994)  studied  three  commonly  used  methodologies  for 
assessing  dimensionality  in  a  set  of  item  responses.  The  three  procedures- 
DIMTEST,  Holland  and  Rosenbaum's  approach,  and  nonlinear  factor  analysis- 
-were  unreliable  in  detecting  lack  of  unidimensionality  in  real  data  sets. 

Although  the  more  recent  procedures  based  on  local  independence,  full 
information  factor  analysis,  and  essential  unidimensionality  offer  promise  for 
assessing  the  dimensionality  of  dichotomous  data,  especially  with  large 
datasets,  a  satisfactory  method  has  not  yet  been  agreed  upon  by 
measurement  researchers.  Because  of  the  current  lack  of  an  acceptable  index 
to  detect  multidimensionality,  it  becomes  even  more  urgent  to  understand 


45 
exactly  what  effect  violation  of  the  unidimensionality  assumption  may  have  on 
IRT  applications.  When  a  test  measures  several  dimensions,  examinees' 
scores  will  be  influenced  by  all  of  these  factors.  As  a  result,  systematic  and 
unsystematic  errors  of  equating  might  be  expected  from  scaling  and  equating 
procedures  that  are  applied  to  multidimensional  tests  (Yen,  1984).  The 
estimation  of  ability  and  item  parameters  is  likely  to  be  affected  also. 
Multidimensionalitv  and  Parameter  Estimation 

Violation  of  the  unidimensionality  assumption  has  been  suggested  as  a 
problem  in  the  estimation  of  item  and  ability  parameters,  the  first  step  in  IRT 
equating  procedures.  Thus,  it  is  important  to  determine  how  robust  estimation 
procedures  are  to  this  violation. 

Ansley  and  Forsyth  (1985)  used  a  noncompensatory  M3PL  model  to 
simulate  a  two-dimensional  dataset.  The  two  discrimination  parameters  were 
set  to  have  respective  means  of  1.23  and  .49  and  respective  standard 
deviations  of . 34  and  .  1 1 .  The  b  values  were  scaled  to  reflect  fairly  easy  items 
(libi  =  -33,  abi  =  82,  ub2  =  -1 .03,  ab2  =  82).  The  c  parameter  was  set  to  .2.  A 
bivariate  normal  distribution  was  selected  to  generate  the  8  vectors  with  both 
dimensions  scaled  to  have  mean  0  and  standard  deviation  1.0.  The 
correlation  p(8i,  82)  was  varied  with  values  of  0.0,  .3,  .6,  .9,  and  .95  simulated. 
Four  combinations  of  sample  size  (1 ,000  and  2,000)  and  test  length  (30,  60) 
were  examined.  Corresponding  unidimensional  datasets  were  also  simulated. 
Correlations  of  the  estimated  and  simulated  parameters  showed  the  a, 


46 

estimates  appeared  to  be  averages  of  the  true  ai  and  a2  values.  The  b, 
estimates  overestimated  the  true  bi  values.  The  6  estimates  were  highly 
related  to  the  averages  of  the  true  6  values.  The  authors  concluded  that  item 
parameter  estimation  was  affected  by  violation  of  the  unidimensionality 
assumption,  but  as  the  0  vectors  became  more  highly  correlated,  the 
estimations  derived  from  the  two-dimensional  dataset  approached  results 
obtained  from  the  unidimensional  data.  Sample  size  and  test  length  had  little 
effect  on  any  of  the  relationships. 

Reckase  (1979)  studied  five  forms  of  the  Missouri  State  Testing 
Program  and  five  datasets  simulated  to  match  various  factor  structures  to 
determine  what  characteristics  are  estimated  by  the  unidimensional  Rasch  and 
3PL  models  when  the  data  are  multidimensional.  Reckase  concluded  that  for 
tests  with  several  equally  strong  dimensions,  the  Rasch  estimates  should  be 
considered  as  a  sum  or  average  of  the  abilities  required  for  each  dimension. 
For  data  with  a  dominant  first  factor,  the  Rasch  and  3PL  difficulty  estimates 
were  highly  correlated  with  the  scores  for  that  factor.  With  the  3PL  model  and 
more  than  two  potent  factors,  the  b,  estimates  correlated  with  just  one  of  the 
common  factors.  The  author  concluded  good  ability  estimates  can  be  obtained 
from  unidimensional  estimation  procedures  when  the  first  factor  accounts  for  at 
least  20  percent  of  the  test  variance,  as  is  likely  in  practice. 

Yen  (1984)  used  data  simulated  with  a  compensatory  M3PL  model  and 
data  from  the  Comprehensive  Test  of  Basic  Skills,  Form  U  (CTBS/U)  to  study 


47 

unidimensional  parameter  estimation  of  multidimensional  data.  A  variety  of  a, 
parameters  were  configured  and  p(9i,  82)  was  set  at  .5  or  .6.  When 
multidimensionality  was  present,  the  a,  and  b,  parameter  estimates  were 
larger  than  those  of  unidimensional  sets  of  items.  The  unidimensional 
estimates  of  both  a,  and  6  parameters  appeared  to  be  a  combination  of  the 
respective  two-dimensional  parameters. 

Data  simulated  from  a  hierarchical  factor  model  was  used  in  a  study  by 
Drasgow  and  Parsons  (1983).  Item  responses  were  generated  from  five 
oblique  common  factors.  Loadings  were  varied  producing  diversity  in 
correlations  between  the  common  factors.  Each  simulated  dataset  consisted 
of  50-item  tests  and  1 ,000  simulees.  The  general  latent  trait  was  recovered 
well  when  the  correlations  between  the  common  factors  were  .46  or  higher. 

Harrison  (1986)  also  used  a  hierarchical  factor  model  to  simulate  data. 
The  strength  of  the  second-order  general  factor,  the  number  of  first-order 
common  factors,  the  distribution  of  items  loading  on  the  common  factors,  and 
the  number  of  test  items  were  manipulated.  The  effect  of  test  length  was 
significant.  As  the  number  of  items  increased,  the  general  trait  was  recovered 
more  effectively  regardless  of  the  latent  structure,  distribution  of  items  across 
common  factors,  or  the  number  of  common  factors.  Estimation  of  the  b, 
parameters  was  found  to  be  robust  to  violations  of  unidimensionality.  The 
estimation  of  both  the  a,  and  b,  parameters  improved  as  test  length  and 
strength  of  the  general  factor  increased.  In  general,  Harrison  found 


48 

unidimensional  parameter  estimation  procedures  to  be  robust  in  the  presence 
of  multidimensional  data. 

The  studies  reviewed  indicate  that  IRT  parameters  implied  by  the 
general  factor  are  recovered  well  when  the  common  factors  have  sufficiently 
high  correlations.  Reckase,  Ackerman,  and  Carlson  (1988)  used  both 
simulated  and  empirical  data  to  demonstrate  that  items  can  be  selected  to 
construct  a  test  that  meets  the  unidimensionality  assumption  even  though 
more  than  one  ability  is  required  for  a  correct  response.  The  authors  showed 
that  the  unidimensionality  assumption  only  requires  the  items  in  a  test  to 
measure  the  same  composite  of  abilities.  This  seems  to  have  been  met  in  the 
previous  investigations.  Based  on  this  study,  it  appears  as  if  the 
unidimensionality  assumption  is  not  as  restrictive  as  formerly  thought. 

Although  these  studies  explored  the  effect  of  multidimensionality  on 
unidimensional  parameter  estimation,  it  is  also  important  to  understand  what 
effect  the  choice  between  compensatory  and  noncompensatory 
multidimensional  models  may  have  on  estimation.  Ackerman  (1989)  simulated 
two-dimensional  data  using  both  compensatory  and  noncompensatory  M2PL 
models.  Forty  two-dimensional  items  were  generated  using  the  compensatory 
model.  Difficulty  was  confounded  with  dimensionality  and  p(6,,  62)  was 
selected  at  0.0,  .3,  .6,  and  .9.  For  each  compensatory  item,  a  corresponding 
noncompensatory  item  was  created  using  a  least-squares  approach  to 
minimize  the  quantity 


49 

10OO 

ZKPcISj,  a,  b)-(PNC|9j,  a,  b)f  (16) 

where  Pc  is  a  given  compensatory  item's  probability  of  a  correct  response  and 
Pnc  is  the  noncompensatory  item's  probability  of  a  correct  response  which 
varies  as  a  function  of  a  and  b  given  6.  The  tridimensional  2PL  model  was 
used  to  estimate  parameters  using  both  BILOG  and  LOGIST.  The  authors 
discovered  minimal  differences  in  the  IRS  for  each  model  when  the  parameters 
are  matched.  The  confounding  of  difficulty  with  dimensionality  was  only 
detected  by  BILOG.  For  both  models,  as  p(Gi,  92)  increased,  the  response 
data  became  more  unidimensional  and  estimation  of  all  parameters  improved. 

Way,  Ansley,  and  Forsyth  (1988)  also  compared  compensatory  and 
noncompensatory  models  with  simulated  data.  The  values  assigned  p(8i,  G2) 
ranged  from  0.0  to  .95.  Results  showed  the  number-right  distributions  for  the 
two  models  were  comparable.  In  the  noncompensatory  model,  the 
unidimensional  a,  estimates  appeared  to  be  averages  of  the  a^  and  a2  values, 
while  the  compensatory  model  provided  a,  estimates  best  considered  as  sums 
of  ai  and  a2.  The  b,  estimates  for  the  noncompensatory  data  were  greater 
than  bi  values,  while  the  compensatory  model  seemed  to  average  the  bi  and 
b2  values.  For  both  models,  the  9  estimates  were  related  to  the  average  of  the 
two  6  parameters. 

A  summary  of  the  studies  investigating  the  effect  of  multidimensional 
data  on  unidimensional  IRT  parameter  estimation  is  presented  in  Table  3. 
Generally,  parameters  appear  to  be  recovered  adequately  with  data  fit 


50 


I 


I 


co 

> 


<o  <f> 

II 

ZO 


CO 

o 

£ 

CO 

CO 

<u 

E 

-Q 

E 

CO 

3 

r- 

</> 

c 

o 

CO 

E 

i 

"m 

E 

til  5 

O) 

o 

c 

CO 

CD 

o 

3 

E 

Sto 

m 

"S5 

B 

h- 

>^ 

13 

3 

(0 

1 

CD 

.!» 

>- 

-°a 

1 

.c  -o 

CO 

c 
o 

0) 

c 

il 

13 

~B> 

<D    d. 

CD    F 

c   £ 

w 

CD 

E 

T3 
c 
D 

O 
C 
O 

92) 

jlty  confound 
3.  vs  noncom 
G  vs  LOGIS1 

CD 

co 

N 
CO 
CD 
Q- 

E 

CO 

c 
.ffi 

!  2 
<£■!> 

-  o 
SB 

a.  ,2 

2 

0)   o 

"1 

o  c 
S  o 
3  E 

-C 

a. 
c 
.2> 

O 

m  £ 
II 

c  c 
<u  o 

.11 

s.!e  od 

cd 

In 

CD 

B 

i     0) 

a>  o 

c  ■«. 
<u  o 

in 

o  ■S 

f 

o 

aQ  O  00 

"a  CO  H- 

t-  O 

OttH 

ft  UJ 

> 

E 
o 
O 

_j" 
a. 

CM 

2 

CO 

£  c 

r-    <0  .2 
111 

.1  g  8 
co  " 


O- 

E 


o 

z 


Q- 
CM 


s 

9  6 

S  T5 
a)  ,2 

X 


i! 

u  o 
I 


o 

o 

g 

1 

15 

■(5 

=3 

3 

3 

E 

E 

E 

CO 

co 

c 

co 

.c 

o 

o 

CO 
Q. 

CO 

en 

LL 

oa 

t- 

o« 

5 

c 

CD 

lO 

o 

CO 

8 

c 

CO 
O) 

m 

CO 

CO 

CO 

< 

m. 

a 

Z^- 

X 

o  .!2 


E  !2 
COS 


S2 

CD 

■J) 

T5E 

CD    CO 

^  (0 


a. 

CO 


Q. 

E 
o 
O 


CL 

CM 
S 


co  O 
"5  < 

E 

CO 

c 

■  Z? 

ECO 
i_  5 
cd  °> 
-*  n 
o  ** 

<s 
«! 

J2c5 
or 


E 

CO 


o 

E 

o 


? 


E 

8 

c 
o 

z 


o 

£ 

o 

to 
m 

c 

cl 

E 
o 
o 


E 
o 
O 


zl 


51 

conditions  usually  found  in  practice.  Both  compensatory  and 
noncompensatory  models  are  apparently  viable  as  MIRT  models.  Determining 
the  adequacy  of  unidimensional  parameter  estimation  of  multidimensional  data 
has  important  consequences  for  equating  multidimensional  tests. 

In  addition  to  the  estimation  procedures  discussed,  the  relationship 
between  multidimensional  and  unidimensional  IRT  models  can  also  be 
approached  from  an  analytical  framework.  Wang  (1986),  as  reported  in 
Ackerman  (1988)  and  Oshima  and  Miller  (1990),  determined  explicit  algebraic 
relationships  between  unidimensional  estimates  and  the  true  multidimensional 
parameters  for  the  case  in  which  the  underlying  response  process  is  modeled 
by  the  compensatory  M2PL  model  and  the  unidimensional  2PL  model.  Using 
the  results  for  unidimensional  estimation  of  a  multidimensional  data  matrix, 
Wang  concluded  that  the  unidimensional  item  parameter  estimates  are 
obtained  as  a  weighted  composite  of  the  underlying  traits.  The  weights  are  a 
function  of  the  discrimination  vectors  for  the  items,  the  correlations  among  the 
latent  traits,  and  the  difficulty  parameters  of  the  items.  For  group  g  who  can  be 
described  as  having  a  diagonal  variance-covariance  structure  Qg  and  a  mean 
ability  vector  y,  the  2PL  item  parameters  for  two-dimensional  item  j  can  be 
approximated  by 

3j=    ,  -  (17) 

^2.89  +  a/n2n2'aj 


52 


6.-^5=  (18) 


where  a,-  is  the  discrimination  vector  for  the  M2PL  model;  d,  is  the  difficulty 
parameter  for  the  M2PL  model;  fit  and  Q2  are  the  first  and  second 
standardized  eigenvectors  of  the  matrix  VA'M  where  A  is  the  matrix  of 
discrimination  parameters  for  all  items  in  the  test  and  KX  =  Q.  Therefore, 
when  the  means,  standard  deviations,  and  item  parameters  of  a  two- 
dimensional  distribution  are  known,  the  corresponding  2PL  unidimensional 
item  parameters  can  be  approximated. 
Multidimensionalitv  and  IRT  Equating 

In  practice,  test  equating  almost  exclusively  assumes  unidimensionality. 
A  single  score  from  one  test  is  transformed  to  a  single  score  from  another  test. 
An  understanding  of  what  effect  the  presence  of  multidimensional  data  has  on 
these  unidimensional  equating  results  is  of  paramount  importance. 

Dorans  and  Kingston  (1985)  equated  four  forms  of  the  Verbal  GRE 
Aptitude  Test  using  the  3PL  model  and  an  equated  bs  procedure.  Two  data 
collection  designs,  equivalent  groups  and  anchor-test,  were  investigated  as 
well  as  several  variations  in  calibration  procedures.  Dimensionality  was 
assessed  through  factor  analyses  conducted  at  the  item  level  on  interitem 
tetrachoric  correlations.  Two  highly  related  verbal  dimensions  were  identified. 


53 

To  examine  their  results,  the  researchers  first  calibrated  the  whole  test, 
then  divided  the  test  items  into  two  homogeneous  subgroups.  The  subgroups 
were  recalibrated  separately  and  placed  on  the  same  scale  as  the  original  test. 
They  were  then  recombined  back  into  an  entire  test  and  their  corresponding 
ICCs  were  compared.  The  authors  discovered  that  differences  in  magnitude  of 
discrimination  parameter  estimates  had  an  impact  on  IRT  equating  results, 
affecting  the  symmetry  of  the  equating.  However,  the  different  research 
combinations  yielded  very  similar  equatings,  leading  the  authors  to  conclude 
that  IRT  equating  may  be  sufficiently  robust  to  the  dimensionality  displayed  in 
their  data. 

Cook  and  Eignor  (1988)  used  SAT  data  that  was  suspected  to  be 
multidimensional  to  examine  the  robustness  of  3PL  model  concurrent 
calibration  and  the  characteristic  curve  transformation  procedures.  Scale  drift 
was  used  as  the  criterion  for  evaluating  equating  results.  Cook  and  Eignor 
concluded  that  both  IRT  equating  methods  produced  acceptable  results 
despite  the  multidimensionality  present  in  the  tests  being  studied. 

In  addition  to  studying  parameter  estimation,  Yen  (1984)  equated  the 
LOGIST  trait  estimates  for  both  real  (CTBS/U)  and  simulated  data.  Several 
statistics  were  used  to  evaluate  the  results:  (1)  the  correlation  r ;  (2) 
standardized  difference  between  means  (SDM);  (3)  ratio  of  standard 
deviations;  and  (4)  standardized  root  mean  squared  difference  (SRMSD). 
Trait  estimates  based  on  items  that  measured  different  dimensions  had  lower 


54 
correlations  and  higher  SDMs  and  SRMSDs.  That  is,  when  tests  measuring 
different  dimensions  were  equated,  large  unsystematic  errors  occurred. 
Systematic  errors  were  found  only  when  the  tests  measured  several 
dimensions  that  differed  in  difficulty  and  were  likely  to  be  taught  sequentially, 
as  in  a  vertical  equating  situation. 

Camilli,  Wang,  and  Fesq  (1995)  adapted  the  methodology  of  Dorans 
and  Kingston  (1985)  to  examine  how  multidimensionality  may  affect  the 
equating  of  the  Law  School  Admission  Test  (LSAT).  Two  dimensions  of  the 
LSAT  were  identified  using  primary  and  secondary  factor  analyses,  and  the 
stability  of  the  dimensions  was  established  over  six  administrations.  The  test 
was  divided  into  two  homogeneous  subtests  to  study  the  effect  of 
multidimensionality  on  IRT  true-score  test  equating.  Item  calibration  was  done 
with  BILOG.  The  authors  found  very  small  differences  in  the  equatings  except 
at  the  ends  of  the  raw  score  distribution.  They  concluded  that,  for  the  LSAT, 
IRT  true-score  equating  was  robust  to  the  presence  of  multidimensionality. 

These  empirical  studies  indicate  that  violations  of  the  unidimensionality 
assumption,  while  having  some  impact  on  results,  may  not  be  significant. 
However,  different  tests  were  used  in  this  research  and  their  content  may  have 
affected  findings  in  an  unknown  manner.  Therefore,  the  generalization  of 
results  are  difficult  to  interpret  across  studies  (Skaggs  &  Lissitz,  1986a).  Also, 
because  indices  designed  to  detect  multidimensionality  are  generally 
unsatisfactory,  it  is  necessary  to  design  research  studies  that  permit 


55 

manipulation  of  independent  variables  to  understand  exactly  how  violations  of 
the  unidimensionality  assumption  affect  equating.  Simulation  studies  present  a 
technique  to  manipulate  and  control  the  desired  variables. 

There  has  been  little  simulation  research  on  the  effects  of 
multidimensionality  on  unidimensional  IRT  equating.  One  notable  exception  is 
a  study  by  Doody-Bogan  and  Yen  (1983).  The  main  purpose  of  this  paper  was 
to  examine  the  stability  of  several  chi-square  statistics  for  their  ability  to  detect 
multidimensionality  in  vertical  equating,  but  the  findings  are  significant  in  the 
context  of  unidimensional  equating  with  multidimensional  data.  Four 
multidimensional  data  configurations  were  simulated  with  the  compensatory 
M3PL  model  described  in  Equation  9.  One  unidimensional  3PL  dataset  was 
also  generated.  Three  differences  in  mean  ability  between  the  two  tests  to  be 
equated  were  simulated  with  parameter  estimates  for  all  data  modelled  after 
the  CTBS  for  realism.  Correlations,  standardized  difference  between  means 
(SDM),  and  standardized  root  mean  square  differences  (SRMSD)  were  used  to 
evaluate  results.  The  findings  of  this  study  were  mixed.  When  the  correlations 
were  examined,  the  results  of  the  equatings,  both  horizontal  and  vertical,  were 
as  good  for  the  tests  with  multidimensional  configurations  as  for  the 
unidimensional  tests.  On  the  other  hand,  when  the  means  were  used  as  the 
criterion  for  comparison,  the  multidimensional  tests  provided  worse  equatings 
than  the  unidimensional  data,  especially  when  the  tests  differed  in  difficulty. 


56 
Another  concern  raised  was  that  the  equatings  might  deteriorate  if  the  factors 
loaded  differently  on  the  two  tests. 

More  recently,  attempts  have  been  made  to  develop  a  multidimensional 
equating  procedure.  Hirsch  (1989)  conducted  a  study  in  which  real  and 
simulated  data  were  equated  with  a  multidimensional  method.  The  procedure 
involves  (a)  estimating  item  parameters  and  abilities  on  both  dimensions  for 
both  tests,  (b)  identifying  common  basis  vectors,  (c)  aligning  basis  vectors 
through  Procustes  rotation,  and  (d)  equating  means  and  standard  deviations  of 
the  ability  estimates  for  each  dimension  of  the  two  tests.  Results  of  this 
preliminary  research  indicated  that  effective  equating  was  possible  with  these 
techniques,  but  the  instability  of  the  ability  estimates  make  it  impractical  at  this 
time.  While  work  on  development  of  MIRT  equating  is  continuing  (Hirsch  & 
Miller,  1991),  the  procedure  has  little  current  value  for  the  equating  needs  of 
testing  companies.  The  results  of  the  studies  of  unidimensional  equating  with 
multidimensional  data  are  summarized  in  Table  4. 

The  emphasis  of  the  present  study  was  to  examine  the  effect  of 
multidimensional  data  on  unidimensional  IRT  equating  through  the  use  of  a 
simulation  study.  The  research  questions  chosen  were  those  considered  to  be 
of  most  value  to  the  practitioner. 


57 


I  E 

ro  o 

II 

w  o 


CO 

i> 

.Q 
CO 

> 

c 
(1) 

c 
(1) 

D. 
<D 

■o 

c 


(0 

Q 


I 


E 


.£*  to 

'■=  5  ■? 

•  o  o 

if  I 

2  E  E 

<B    O)  oi 

E  c  c 

™  to  to 

II  I 


I 

il 

Q 


UJ  "~ 


1   a 

5?55 
o  Dtt 


o 

5 


W  5 
O 


ocotoS 


CO     CL 


cue 
-5  To  5 

iiii 


Tg 


5' 


E 


2>ro 

CO  i 


< 


o 

to 

_l 

_i 

il 

0- 

0- 

<o 

(O 

CO  CO 

c 

.2 

> 

3 

E  m 

111 

w 

m 

i- 

WS 

O 

o 

5~     w 


8" 


o  ^- 
o 


6  g 

DO  C- 
i  C 
T3    CO 

,§•8 


•a  g 

CO 

rans 

ingsti 
985) 

o> 

c 

X  *  — 

0) 

Q 

>- 

CHAPTER  3 
METHOD 


Purpose 
Introduction 

The  purpose  of  this  study  was  to  examine  the  effects  of 
multidimensional  data  on  unidimensional  equating  procedures.  The  effects  of 
the  number  of  multidimensional  items,  type  of  multidimensional  model,  and 
choice  of  equating  procedure  were  investigated.  Most  investigations  were 
conducted  with  randomly  equivalent,  normally  distributed  examinee  groups 
having  mean  0  and  standard  deviation  1 .  In  addition,  data  from  examinee 
groups  of  lower  ability  (  X ,  =  -0.8,  SD,  =  0.6)  were  equated  to  results  obtained 
from  the  randomly  equivalent  groups. 

The  methods  applied  to  investigate  these  effects  are  described  in  this 
chapter.  The  methodology  is  discussed  in  the  following  sections:  (a)  data 
generation,  (b)  estimation  of  parameters,  (c)  equating,  and  (d)  criteria  for 
evaluation. 
Research  Questions 

The  specific  questions  to  be  answered  in  the  present  study  were: 
1 .   Does  the  number  of  multidimensional  items  in  a  test  affect 
unidimensional  equating  results? 


59 

2.  Does  the  equating  procedure  affect  unidimensional  equating  results? 

3.  Do  data  simulated  by  using  a  compensatory  multidimensional  model 
produce  different  unidimensional  equating  results  than  data  simulated  using  a 
noncompensatory  model? 

4.  Are  unidimensional  equating  results  affected  by  differing  ability 
distributions  of  the  two  examinee  groups? 

Data  Generation 
Design 

Data  for  two  parallel  forms,  A  and  B,  of  each  test  condition  were 
simulated.  Four  test  conditions  were  created  by  varying  the  number  of 
multidimensional  items  contained  in  each  test.  These  conditions  were  created 
to  mirror  what  might  be  found  in  published  tests.  For  example,  in  a  test  of 
mathematics  problem  solving,  all  items  might  be  multidimensional  to  some 
degree  if  reading  skill  were  also  required.  However,  relatively  few 
multidimensional  items  might  be  found  in  a  reading  comprehension  test 
containing  only  one  graph-reading  passage  that  also  needed  a  math  skill  for 
completion.  In  the  present  study,  10,  20,  30,  and  40  items  of  an  40  item  test 
were  two-dimensional.  These  conditions  are  referred  to  as  MD10,  MD20, 
MD30,  and  MD40  respectively. 

In  addition  to  modifying  the  number  of  multidimensional  items,  the 
strength  of  each  multidimensional  item's  first  factor  was  manipulated.  This 
was  done  within  each  test  condition  because  it  is  unreasonable  to  expect  a 
published  test  to  contain  multidimensional  items  which  all  have  an  identical 


60 
factor  structure.  The  angle  of  item  direction  was  varied  to  20°,  30°,  45°,  and 
60°  to  reflect  items  that  predominantly  measure  the  first  trait  (20°  and  30°), 
both  traits  equally  (45°),  and  the  second  trait  (60°). 

Finally,  data  were  originally  generated  using  a  compensatory 
multidimensional  model.  To  investigate  any  variations  due  to  the  difference  in 
modeling,  each  compensatory  dataset  was  transformed  into  its  corresponding 
noncompensatory  parameters  through  application  of  the  least-squares 
approach  used  by  Ackerman  (1989)  and  described  in  Chapter  2. 
Noncompensatory  parameters  were  considered  corresponding  if  the  probability 
of  a  correct  response  was  the  same  as  for  the  compensatory  parameters.  This 
was  accomplished  through  the  NLIN  procedure  in  the  Statistical  Analysis 
System  (SAS.1989).  Specific  methodology  is  discussed  later  in  this  chapter. 
Model  Description 

To  avoid  problems  associated  with  estimating  the  lower  asymptote,  the 
compensatory  multidimensional  two-parameter  logistic  (M2PL)  model 
(Reckase,  1985)  was  selected  for  data  generation.  Because  this  is  a 
compensatory  model,  high  abilities  on  one  ability  trait  are  allowed  to 
compensate  for  lower  abilities  on  the  second  ability  trait. 

The  multidimensional  item  difficulty  (MID/ )  parameter  was  defined  by 
Reckase  as  in  equation  14  where  a,*  is  the  Mh  element  of  a,  and  m  is  the 
number  of  dimensions.  The  data  of  interest  in  this  study  were  considered  to 
be  two-dimensional,  so  m  equaled  2.  Multidimensional  item  difficulty  is  the 


61 
distance  from  the  origin  of  the  multidimensional  ability  space  to  the  point  where 
the  item  provides  maximum  examinee  information,  or  where  the  IRS  has  the 
steepest  slope.  A  line  joins  these  points  at  angle  a*  .  In  a  two-dimensional 
item,  the  value  of  a*  can  range  between  0°  and  90°  depending  on  the  degree 
to  which  the  item  measures  the  two  traits.  If  the  item  only  measures  the  first 
trait,  an  equals  0°,  while  cm  =  90°  would  depict  an  item  measuring  only  the 
second  trait.  For  this  study,  an  was  set  to  either  0°,  20°,  30°,  45°,  or  60°. 
Item  Parameters 

Four  tests  with  40  items  each  were  simulated  using  the  compensatory 
M2PL  model  described  above.  Forty  items  were  selected  as  sufficient  to 
provide  good  equating  results.  An  anchor  test  design  was  chosen  for  data 
collection  as  it  is  widely  used  by  practitioners. (Skaggs  &  Lissitz,  1886a).  Each 
test  consisted  of  two  forms  with  12  common  linking  items  and  28  unique  items. 
The  difficulty  values  were  selected  to  be  reasonable  for  published  tests.  Lord 
(1968)  found  difficulties  ranging  from  -1.5  to  2.5  (  X  =0.58,  SD=0.87)  on  SAT 
Verbal  data.  Doody-Bogan  and  Yen  (1983)  employed  a  range  of  bi  of  -2.0  to 
1.52  (  X  =-0.028,  SD=0.818)  in  a  simulation  designed  to  imitate  CTBS-U  data. 
In  a  study  using  multidimensional  data,  Ackerman  (1988)  reported  MID  values 
ranging  form  -0.73  through  1.87  on  an  ACT  Mathematics  test.  Oshima  and 
Miller  (1990)  used  MID  values  in  the  interval  -2.0  to  2.0.  For  the  purpose  of 
this  investigation,  multidimensional  item  difficulty  parameters  (MID)  were 
generated  using  the  RANNOR  function  of  SAS.  Values  were  chosen  randomly 


62 

from  a  normal  distribution  within  the  range  of  -2.0  through  2.0  and  to  have 
mean  0  and  standard  deviation  1.0. 

The  multidimensional  discrimination  parameters  (MDISC)  defined  by 
equation  15  were  randomly  selected  from  a  lognormal  distribution.  A  majority 
of  MDISC  values  lay  between  .5  and  2.5  with  mean  1.15  and  standard 
deviation  .60.  These  values  correspond  to  those  reported  by  Doody-Bogan 
and  Yen  (1983)  of  .5  to  2.00  with  mean  1.03  and  standard  deviation  .3387. 
Ackerman  (1988)  found  an  MDISC  range  of  .58  through  2.39. 

To  create  two  40  item  test  forms,  68  items  were  generated  for  each  test 
condition.  The  first  12  items  in  each  set  were  identified  as  the  linking  items 
and  were  common  to  both  forms.  Items  13  through  40  were  unique  items  for 
Form  A  and  items  41  through  68  were  unique  to  Form  B.  In  order  to  simulate 
two-dimensional  items,  the  values  of  an  as  expressed  in  Equation  13  varied. 
In  the  case  of  unidimensional  items,  an  was  set  to  0°.  For  two-dimensional 
items,  an  was  either  20°,  30°,  45°,  or  60°.  Those  items  with  aM  =  20°  or  30°, 
primarily  measured  the  first  trait.  Items  having  an  =  45°  measured  both  traits 
equally,  and  those  with  an  =  60°  discriminated  on  the  second  factor  more 
heavily.  More  multidimensional  items  in  this  study  predominantly  measured 
the  first  factor  because  it  is  reasonable  to  anticipate  this  to  occur  in  a  well- 
designed  commercial  test.  These  four  an  values  were  spiraled  throughout  the 
items  in  each  dataset.  To  illustrate,  in  MD40  an  was  20°  for  item  1 ,  30°  for 


63 
item  2,  45°  for  item  3,  and  60°  for  item  4.  This  pattern  then  repeated  for  the  64 
remaining  items. 

For  datasets  containing  both  unidimensional  and  two-dimensional  items, 
the  last  3,  6,  and  9  linking  items  were  multidimensional  for  MD10,  MD20,  and 
MD30  respectively.  Thus  the  linking  test  had  the  same  proportion  of 
unidimensional  items  as  did  the  coresponding  unique  items  in  each  condition.. 
The  last  7,  14,  and  21  unique  items  for  each  of  Forms  A  and  B  were  also 
multidimensional.  Table  5  presents  the  item  parameters  for  Form  A  of  MD30 
with  75%  of  the  items  in  each  form  being  two-dimensional. 
Response  Data 

For  each  experimental  condition  and  form,  response  vectors  for  1,000 
simulees  were  generated.  This  sample  size  was  selected  as  being  adequate 
to  provide  stable  parameter  estimates.  The  ability  values  were  randomly 
generated  through  the  normal  distribution  RANNOR  function  of  SAS  to  range 
from  approximately  -3.00  to  3.00.  The  theta  values  were  assumed  to  be 
uncorrelated.  Probabilities  of  correctly  answering  an  item  were  then  calculated 
for  each  simulee  through  application  of  Equation  1 1 .  Finally,  the  SAS  function 
RANUNI  was  used  to  produce  a  random  number  from  the  uniform  distribution 
between  0  and  1 .  If  this  number  was  less  than  or  equal  to  P(X|j  =  1  |ai,dj,  8j), 
the  simulee  passed  the  item.  If  the  random  number  was  greater,  the  simulee 
failed.  To  increase  confidence  in  results,  twenty  sets  of  response  data  were 
generated  for  each  condition  and  form. 


64 


Table  5 

Simulated  Compensatory  Parameters  for  MD30,  Form  A 


Item     Form  <*,  a, aj d, MDISC  MID 


1 

A,B 

0 

0.475 

0.000 

-0.584 

0.475 

1.231 

2 

A,B 

0 

0.563 

0.000 

-0.173 

0.563 

0.308 

3 

A,B 

0 

0.515 

0.000 

0.652 

0.515 

-1.266 

4 

A,B 

60 

0.736 

1.275 

1.199 

1.472 

-0.814 

5 

A,B 

20 

1.159 

0.422 

0.681 

1.234 

-0.552 

6 

A,B 

30 

0.706 

0.407 

-0.054 

0.815 

0.066 

7 

A,B 

45 

0.936 

0.936 

-0.939 

1.323 

0.709 

8 

A,B 

60 

0.291 

0.504 

-0.618 

0.582 

1.062 

9 

A,B 

20 

0.684 

0.249 

-0.599 

0.728 

0.822 

10 

A,B 

30 

0.882 

0.510 

1.652 

1.019 

-1.621 

11 

A,B 

45 

1.129 

1.129 

2.676 

1.597 

-1.675 

12 

A,B 

60 

0.881 

1.526 

-1.018 

1.763 

0.578 

13 

A 

0 

0.973 

0.000 

0.549 

0.973 

-0.565 

14 

A 

0 

1.358 

0.000 

-0.324 

1.358 

0.239 

15 

A 

0 

1.857 

0.000 

1.417 

1.857 

-0.763 

16 

A 

0 

0.860 

0.000 

-0.524 

0.860 

0.609 

17 

A 

0 

1.448 

0.000 

1.538 

1.448 

-1.062 

18 

A 

0 

1.517 

0.000 

-0.448 

1.517 

0.295 

19 

A 

0 

0.663 

0.000 

-0.142 

0.663 

0.214 

20 

A 

60 

0.480 

0.832 

0.723 

0.961 

-0.753 

21 

A 

20 

0.648 

0.236 

-0.550 

0.689 

0.798 

22 

A 

30 

1.944 

1.122 

0.992 

2.244 

-0.442 

23 

A 

45 

1.120 

1.120 

0.654 

1.584 

-0.413 

24 

A 

60 

0.268 

0.464 

-0.122 

0.535 

0.228 

25 

A 

20 

0.790 

0.288 

0.295 

0.841 

-0.351 

26 

A 

30 

0.442 

0.255 

0.159 

0.510 

-0.313 

27 

A 

45 

1.452 

1.452 

0.019 

2.053 

-0.009 

28 

A 

60 

0.328 

0.568 

-0.243 

0.656 

0.370 

29 

A 

20 

0.744 

0.271 

0.055 

0.792 

-0.070 

30 

A 

30 

0.398 

0.230 

0.315 

0.460 

-0.686 

31 

A 

45 

0.355 

0.355 

0.924 

0.502 

-1.840 

32 

A 

60 

0.465 

0.806 

-1.060 

0.930 

1.140 

33 

A 

20 

1.442 

0.525 

-1.014 

1.535 

0.661 

34 

A 

30 

1.031 

0.595 

-0.284 

1.191 

0.238 

35 

A 

45 

0.879 

0.879 

1.320 

1.244 

-1.061 

36 

A 

60 

0.431 

0.747 

-0.965 

0.862 

1.119 

37 

A 

20 

0.589 

0.214 

0.533 

0.627 

-0.850 

38 

A 

30 

1.144 

0.661 

2.296 

1.321 

-1.738 

39 

A 

45 

0.810 

0.810 

1.050 

1.145 

-0.917 

40 

A 

60 

0.147 

0.254 

-0.135 

0.293 

0.461 

65 
Noncompensatory  Data 

For  each  compensatory  item  generated,  a  corresponding 
noncompensatory  item  was  created.  A  noncompensatory  item  was  considered 
corresponding  if  it  had  the  same  probability  of  success  as  the  compensatory 
item  (Ackerman,  1989).  To  accomplish  this,  the  NLIN  procedure  of  SAS  was 
applied  to  Equation  16.  Specifically,  the  compensatory  probability  was 
calculated  for  each  case  and  became  the  dependent  variable.  The 
independent  variable  in  the  NLIN  model  statement  was  the  noncompensatory 
probability  function.  Only  multidimensional  items  were  transformed  as  the 
compensatory/noncompensatory  question  was  not  applicable  to 
unidimensional  items.  Starting  values  for  noncompensatory  parameter 
estimation  were  set  to  equal  the  compensatory  parameters.  The  1 ,000  theta 
vectors  generated  for  the  first  of  each  compensatory  response  set  were 
treated  as  known  values.  To  ensure  the  program  was  producing  unique  local 
minima,  starting  values  were  changed  for  several  items  in  each  set  and 
reestimated.  Any  differences  which  appeared  in  the  parameter  estimates  were 
contained  in  the  fourth  or  fifth  decimal  place.  For  approximately  10%  of  the 
items  in  each  dataset,  the  convergence  criterion  was  not  met  within  40 
iterations.  In  these  cases,  the  final  parameter  estimates  were  substituted  for 
the  starting  values  and  the  program  rerun.  In  all  such  cases,  convergence  was 
achieved  with  the  second  attempt. 


66 
Response  vectors  were  generated  by  applying  Equation  10  and  using 
the  same  (81,82)  combinations  utilized  to  produce  the  corresponding 
compensatory  responses.  Twenty  response  sets  were  simulated  for  each 
noncompensatory  dataset.  The  item  parameters  for  the  multidimensional  Form 
A  items  of  noncompensatory  MD30  are  shown  in  Table  6.  Summary  statistics 
for  datasets  of  both  models  are  displayed  in  Table  7. 
Noneauivalent  Groups 

One  of  the  strongest  theoretical  advantages  of  IRT  is  its  usefulness  with 
groups  of  subjects  who  differ  in  abilities.  One  case  where  this  may  occur  is 
when  a  second  form  of  a  test,  such  as  a  high  school  proficiency  test,  is 
administered  only  to  examinees  who  failed  to  pass  the  first  attempt.  To 
examine  the  effect  of  data  from  a  lower  ability  group  being  equated  to  data 
gathered  from  a  normally  distributed  group,  sets  of  1,000  less  able  simulees 
were  generated.  Scores  on  81  for  the  lower  group  ranged  between  -3.00  and 
0.00  with  mean  -0.80  and  standard  deviation  0.6.  Abilities  on  the  second 
dimension  were  normally  distributed  with  mean  0  and  standard  deviation  1 . 
Five  replications  of  scores  were  generated  for  all  four  compensatory  test 
conditions. 

Estimation  of  Parameters 
Unidimensional  IRT 

The  responses  of  the  1 ,000  simulated  examinees  in  each  response  set 
were  analyzed  by  the  computer  program  BILOG  (Mislevy  &  Bock,  1990)  to 
estimate  the  unidimensional  item  discrimination  and  difficulty  parameters. 


67 


Table  6 

Simulated  Noncompensatory  Parameters  for  Multidimensional  Items.  MD30  Form  A 


Item       Form 


-2U_ 


a. 


b, 


4 

A,B 

60 

0.664 

0.888 

-0.945 

0.309 

5 

A,B 

20 

0.778 

0.528 

0.236 

-2.081 

6 

A,B 

30 

0.528 

0.447 

-0.713 

-2.092 

7 

A,B 

45 

0.705 

0.698 

-1.534 

-1.596 

8 

A,B 

60 

0.352 

0.395 

-3.164 

-1.776 

9 

A,B 

20 

0.478 

0.390 

-1.175 

-3.624 

10 

A,B 

30 

0.638 

0.494 

0.834 

-0.661 

11 

A,B 

45 

0.849 

0.844 

0.555 

0.565 

12 

A,B 

60 

0.728 

0.942 

-1.999 

-0.964 

20 

A 

60 

0.496 

0.606 

-1.268 

0.047 

21 

A 

20 

0.184 

0.149 

-1.188 

-4.872 

22 

A 

30 

2.256 

0.235 

-0.426 

0.705 

23 

A 

45 

0.830 

0.792 

-0.481 

-0.491 

24 

A 

60 

0.692 

0.248 

-4.282 

0.835 

25 

A 

20 

0.545 

0.413 

-0.089 

-2.602 

26 

A 

30 

0.344 

0.306 

-0.745 

-2.472 

27 

A 

45 

0.957 

0.910 

-0.750 

-0.786 

28 

A 

60 

0.381 

0.436 

-2.495 

-1.136 

29 

A 

20 

0.516 

0.400 

-0.361 

-2.876 

30 

A 

30 

0.310 

0.276 

-0.561 

-2.430 

31 

A 

45 

0.312 

0.315 

-0.297 

-0.388 

32 

A 

60 

0.499 

0.585 

-2.774 

-1.633 

33 

A 

20 

0.918 

0.610 

-0.856 

-2.870 

34 

A 

30 

0.725 

0.578 

-0.701 

-1.944 

35 

A 

45 

0.698 

0.677 

-0.060 

-0.058 

36 

A 

60 

0.474 

0.551 

-2.809 

-1.638 

37 

A 

20 

0.412 

0.322 

0.187 

-2.772 

38 

A 

30 

0.814 

0.584 

1.073 

-0.399 

39 

A 

45 

0.653 

0.636 

-0.219 

-0.231 

40 

A 

60 

0.096 

0.207 

-4.957 

-3.588 

68 


Table  7 

Summary  Statistics  for  Multidimensional  Items  in  Compensatory  and 


Noncompensatory  Datasets 

1 

I/ID10 

MD20 

MD30 

MD40 

Parameter 

C 

NC 

C 

NC 

C 

NC 

C 

NC 

Mean 

84 

.67 

.37 

.60 

86 

.66 

.91 

.69 

a, 

SD 

.55 

.33 

.80 

.28 

.50 

35 

.50 

.37 

Mean 

.71 

.61 

.75 

56 

.68 

.55 

.70 

.62 

a2 

SD 

.36 

.23 

.47 

.30 

.39 

.23 

38 

.27 

Mean 

.11 

.00 

.25 

.14 

di 

SD 

.97 

1.12 

1.11 

1.07 

b, 

Mean 

-1.09 

-1.08 

-1.02 

-.88 

SD 

1.03 

1.09 

1  28 

1.16 

b2 

Mean 
SD 

-1.37 
.94 

-1.54 
1.80 

-1.35 
1.37 

-1.23 
1.18 

Note.  C  =  Compensatory  item  parameters;  NC  =  Noncompensatory  item  parameters 


69 
Program  default  values  were  used  in  the  calibration  of  the  two-parameter 
logistic  model  item  parameters.  Specifically,  this  involved  marginal  maximum 
likelihood  estimation  procedures,  no  priors  specified  for  difficulties,  and 
lognormal  priors  for  discrimination  parameters.  For  the  randomly  equivalent 
groups,  each  of  the  160  response  sets--20  replications  each  for  four 
compensatory  and  four  noncompensatory  multidimensional  conditions-was 
analyzed  twice.  The  procedure  was  repeated  for  the  nonequivalent  groups. 
First  the  responses  for  combined  Forms  A  and  B  for  each  dataset  were 
analyzed  simultaneously.  Then  each  form  was  analyzed  separately.  This 
resulted  in  a  total  of  520  BILOG  runs. 
Analytical  Estimation 

Unidimensional  estimation  of  the  multidimensional  item  parameters  for 
the  eight  datasets  was  performed  analytically  using  Wang's  (1986)  procedure. 
The  SAS  IML  procedure  was  employed  to  determine  the  unidimensional 
estimates  of  the  two-dimensional  item  parameters  for  each  of  the  eight 
conditions. 

Equating 

In  IRT,  because  the  ICCs  are  population  independent,  item  parameter 
estimates  from  two  BILOG  runs  should  theoretically  be  identical.  However, 
P,<6)  in  the  2PL  model  is  a  function  of  the  quantity  a,  (9  -  b,).  As  such,  the 
origin  and  the  unit  of  0  and  b,  measurement  are  arbitrary  or  indeterminant. 
Any  scale  may  be  selected  for  G  as  long  as  the  same  scale  is  chosen  for  b,. 
Estimated  abilities  and  item  difficulties  from  two  calibration  runs  should  have  a 


70 

linear  relationship  to  each  other  (Petersen  et  al.,  1989).  Equating  is  a 
procedure  used  to  place  the  item  parameters  from  two  tests  on  the  same 
scale. 

Three  unidimensional  IRT  equating  methods  were  selected  for  this 
study:  (a)  concurrent  calibration,  (b)  equated  bs,  and  (c)  characteristic  curve 
transformation. 
Concurrent  Calibration 

Concurrent  calibration  is  the  simplest  of  the  IRT  methods  of  equating  to 
implement.  A  common  group  of  examinees  or  items  is  required  to  tie  the 
information  from  the  two  tests  together.  For  this  study,  the  parameters  of  both 
forms  were  estimated  simultaneously  by  BILOG.  Twelve  common  items  in 
each  dataset  served  to  link  the  forms  and  the  resulting  item  parameter 
estimates  were  therefore  on  the  same  scale.  This  process  was  repeated  for 
each  of  the  response  sets  in  each  condition. 
Equated  bs 

The  equated  bs  method  is  based  on  determining  the  linear  relationship  that 
exists  between  item  difficulties  estimated  in  two  separate  BILOG  calibration 
runs,  one  for  each  form.  The  means  and  standard  deviations  of  the  b,s  for 
each  set  of  linking  items  from  Form  A  and  B  were  calculated.  The  linear 
transformation  was  determined  by 

b*  ■  fj£(b»-  Xb)+  Xa  (19) 


71 

Once  the  slope  (A)  and  intercept  (B)  of  the  linear  transformation  were  found, 
they  were  applied  to  all  ability  and  item  estimates  for  Form  B,  yielding 

%]  =  Ab,  +  B  (20) 

a-=f  (21) 

0a=A9»  +  B  (22) 

All  parameters  were  now  transformed  to  the  same  scale.  Although  item 
discrimination  or  ability  estimates  could  have  been  used  to  determine  the  linear 
transformation,  item  difficulty  estimates  are  usually  used  in  practice  because 
they  yield  the  most  stable  parameter  estimates  (Cook  &  Eignor,  1991). 
Characteristic  Curve  Transformation 

The  parameter  estimates  computed  separately  for  Form  A  and  Form  B 
were  also  used  in  the  characteristic  curve  transformation.  This  equating 
method  used  both  a,  and  bi  estimates  from  the  linking  items  to  derive  a  linear 
transformation  through  an  iterative  process  that  minimized  the  difference 
between  the  item  parameter  estimates  of  the  linking  items.  The  process  is 
based  on  the  assumption  that  if  the  estimates  were  free  of  error,  choosing  the 
proper  linear  transformation  would  cause  the  true-score  estimates  of  the 
linking  items  to  correspond  (Petersen  et  at.,  1989;  Stocking  &  Lord,  1983). 
The  resulting  transformation  was  then  applied  to  all  Form  B  parameters  to 
create  estimates  on  the  same  scale.  The  EQUATE  (Baker,  Al-Karni,  &  Al- 
Dosary,  1991)  computer  program  was  used  to  accomplish  this.  Data  were 


72 


examined  at  80  points  along  the  ICC  and  the  transformation  was  generally 
identified  after  approximately  8-10  iterations. 

All  three  equating  procedures  described  were  applied  to  each  of  the 
replications  for  each  of  the  twelve  data  conditions.  This  resulted  in  660 
equatings  for  this  study.  A  summation  of  the  research  equating  conditions  is 
presented  in  Table  8. 


Table  8 

Summation  of  Research  Equatinq  Conditions 

Equating  Method 

Concurrent 

Equated 

Characteristic 

Dataset                 Calibration 

bs 

Curve 

Compensatory,  Randomly  Equivalent  Groups 
MD10                            V                              V 

V 

MD20                            V 

V 

V 

MD30                            V 

V 

V 

MD40                            V 

V 

V 

Noncompensatory,  Randomly  Equivalent  Groups 
MD10                           V                             V 

V 

MD20                           V 

V 

V 

MD30                            V 

V 

V 

MD40                           V 

V 

V 

Compensatory,  Nonequivalent  Groups 
MD10                            V 

V 

V 

MD20                            V 

V 

V 

MD30                            V 

V 

V 

MD40                            V 

V 

V 

73 

Evaluation  Criteria 

To  establish  a  foundation  for  evaluating  the  results  of  the  research 
equatings,  the  three  comparison  conditions  described  below  were  used.  In 
addition,  three  statistical  criteria-correlation,  standardized  mean  difference, 
and  standardized  root  mean  square  difference-were  applied  to  the  data. 
Comparison  Conditions 

For  the  first  comparison  condition  ,  the  unidimensional  approximations 
of  the  multidimensional  item  parameters  were  calculated  using  the  analytic 
procedure  described  by  equations  17  and  18  (Wang,  1986).  To  compute 
these  approximations  for  the  eight  research  conditions,  the  SAS  IML  procedure 
was  applied  to  each  of  the  simulated  parameter  sets.  The  means  and 
standard  deviations  of  the  responses  for  each  condition  were  determined  for 
inclusion  in  the  formula.  The  resulting  sets  of  unidimensional  comparison  item 
parameters  were  weighted  composites  of  the  item  parameters  for  the  two  traits 
(Ackerman,  1988).  Table  9  presents  the  analytical  unidimensional  item 
parameter  approximations  for  compensatory  MD30,  Form  A.  The  resulting 
analytical  item  parameter  estimates  were  then  fixed  in  BILOG  386  and  all 
compensatory  and  noncompensatory  response  sets  were  analyzed  to  establish 
the  comparison  ability  estimates. 

For  the  next  comparison  condition,  the  second  dimension  of  each 
multidimensional  item  was  ignored.  This  would  be  reasonable  if  arguing  that 
most  published  tests  were  designed  to  measure  only  the  first  factor.  For 


74 


Table  9 

Analytical  Estimates  of  the  Unidimensional  Parameters  for  Compensatory  MD30, 
Form  A 


Item Discrimination Difficulty 

1  0.242  1.408 

2  0.286  0.352 

3  0.262  -1.449 

4  0.679  -0.949 

5  0.712  -0.559 

6  0.479  0.066 

7  0.733  0.737 

8  0.289  1.237 

9  0.422  0.833 

10  0.599  -1.622 

11  0.876  -1.742 

12  0.786  0.673 

13  0.482  -0.646 

14  0.650  0.273 

15  0.842  -0.874 

16  0.429  0.698 

17  0.687  -1.216 

18  0.715  0.338 

19  0.335  0.245 

20  0.466  -0.877 

21  0.400  0.808 

22  1.320  -0.442 

23  0.868  -0.429 

24  0.267  0.265 

25  0.487  -0.355 

26  0.300  -0.312 

27  1.103  -0.010 

28  0.325  0.432 

29  0.459  -0.070 

30  0.270  -0.685 

31  0.283  -1.913 

32  0.452  1.327 

33  0.882  0.670 

34  0.700  0.239 

35  0.690  -1.104 

36  0.421  1.304 

37  0.363  -0.862 

38  0.777  -1.734 

39  0.637  -0.953 
40 0.148  0.536 


75 

example,  although  mathematics  problem  solving  requires  reading  skills  to 
understand  the  prompts,  the  reading  level  is  usually  well  below  the  grade  level 
being  tested.  In  this  study,  the  simulated  ability  parameters  of  the  first 
dimension  only  from  each  compensatory  and  noncompensatory  dataset  were 
utilized.  This  comparison  criterion  would  enable  evaluation  of  how  well  the 
dominant  first  factor  was  recovered  in  the  equatings. 

A  third  comparison  condition  was  created  which  employed  the 
averages  of  the  two  true  6  values.  This  condition  was  based  on  the  parameter 
estimation  studies  of  Yen  (1984)  and  Ansley  and  Forsyth  (1985)  in  which  the 
unidimensional  estimates  of  the  9  parameters  appeared  to  be  combinations  of 
the  true  multidimensional  abilities. 
Statistical  Criteria 

Correlation  coefficients  between  the  simulated  6  and  the  equated  0 
estimates  were  computed  to  establish  the  relationship  between  the  comparison 
criterion  and  the  research  equatings  for  each  condition.  For  concurrent 
calibration,  the  appropriate  simulated  0  parameters  were  correlated  to  the 
corresponding  estimated  ability  parameters  for  both  Form  A  and  Form  B.  Only 
the  equated  form,  Form  B,  was  compared  to  the  comparison  conditions  for  all 
other  equating  procedures. 

The  standardized  difference  between  means  (SDM)  is  the  difference  in 
mean  scores  for  the  two  sets  of  ability  traits  divided  by  a  pooled  estimate  of  the 
standard  deviation 


76 

S=^3i  ,23, 

where  Sf  and  S22  are  the  variances  of  the  two  sets  of  abillities  (Yen,  1984). 
The  means  of  the  estimated  ability  parameters  were  subtracted  from  the 
means  of  each  comparison  condition  to  calculate  this  statistic. 

The  standardized  root  mean  square  difference  is  the  square  root  of  the 
mean  squared  difference  between  examinees'  trait  estimates,  divided  by  S. 
Again,  the  estimated  9  parameter  values  were  subtracted  from  the  appropriate 
comparison  values  to  derive  the  criterion  value. 

Summary 

Four  test  conditions  with  differing  numbers  of  multidimensional  items 
were  simulated  using  the  compensatory  M2PL  item  response  theory  model. 
The  item  direction  for  multidimensional  items  was  varied  within  each  test. 
Comparable  noncompensatory  datasets  were  then  created  for  each  condition. 
Two  40  item  forms  were  constructed  for  each  situation  consisting  of  12  linking 
and  28  unique  items.  Responses  for  1,000  normally  distributed  simulated 
examinees  were  generated  through  application  of  the  appropriate  probability 
equation  and  replicated  20  times.  The  same  (61,62)  combinations  were  used  to 
generate  corresponding  compensatory  and  noncompensatory  response  sets. 
In  addition,  responses  for  1,000  low  ability  examinees  were  generated  with  5 
replications  for  each  compensatory  test  condition. 


77 

Parameter  estimation  was  executed  on  all  conditions  using  both 
unidimensional  IRT  procedures  and  analytical  estimation.  For  the  IRT 
parameter  estimates,  equating  was  performed  through  through  techniques:  (a) 
concurrent  calibration,  (b)  equated  bs,  and  (c)  characteristic  curve 
transformation. 

Three  comparison  conditions--the  first  simulated  theta,  the  average  of 
theta  1  and  theta  2,  and  the  analytical  estimations  of  the  unidimensional 
parameters-were  selected  for  comparison  with  equated  abillity  estimates. 
Finally,  the  three  statistical  procedures  of  correlation,  standardized  mean 
difference,  and  standardized  root  mean  square  difference  were  applied  to 
examine  the  comparisons. 


CHAPTER  4 
RESULTS  AND  DISCUSSION 


Simulated  Data 
Item  Parameters 

Item  parameters  for  two  40  item  forms  of  a  test  were  generated  with  a 
compensatory  multidimensional  2PL  model.  Four  conditions  were  created  with 
either  10,  20,  30,  or  40  multidimensional  items  in  each  form.  Four  degrees  of 
dimensionality  were  spiraled  throughout  each  test  and  form.  Each  form 
contained  twelve  linking  items  that  mirrored  the  total  test  in  psychometric 
properties.  Additionally,  Forms  A  and  B  were  designed  to  be  randomly  parallel. 

Examination  of  the  simulated  compensatory  item  parameters  confirms 
this  was  accomplished.  Descriptive  statistics  for  the  four  compensatory  Form  A 
conditions  are  presented  in  Table  10  and  Form  B  data  are  shown  in  Table  11. 
All  generated  values  are  within  the  limits  found  in  published  tests  and  described 
in  previous  empirical  studies  (Doody-Bogan  &  Yen,  1883;  Ackerman,  1988). 
For  both  forms  and  across  all  conditions,  the  means  of  the  di  parameters 
approach  0.0  with  standard  deviations  of  approximately  1.0.  The  means  and 
standard  deviations  of  all  item  parameters  for  both  forms  are  similar. 

The  multidimensional  compensatory  item  parameters  were  then 
transformed  into  their  noncompensatory  correlates.  Descriptive  statistics  for 


78 


79 


Table  10 

Descriptive  Statistics  for  Compensatory  Form  A  Item  Parameters 


Parameter     Condition       Minimum 


Maximum 


Mean 


SD 


a1 


a2 


MDISC 


MID 


10 

0.29 

3.49 

1.15 

0.8 

20 

0.30 

2.41 

0.89 

0.4 

30 

0.15 

1.94 

0.84 

0.4 

40 

0.28 

2.45 

0.98 

0.6 

10 

0.00 

1.22 

0.17 

0.3 

20 

0.00 

1.87 

0.42 

0.6 

30 

0.00 

1.53 

0.49 

0.4 

40 

0.21 

1.63 

0.71 

0.3 

10 

-2.27 

2.18 

0.08 

1.1 

20 

-2.44 

2.76 

0.20 

1.1 

30 

-1.06 

2.68 

0.25 

0.9 

40 

-2.90 

2.78 

0.17 

1.2 

10 

0.41 

3.49 

1.23 

0.8 

20 

0.30 

2.41 

1.08 

0.5 

30 

0.29 

2.24 

1.04 

0.5 

40 

0.57 

2.61 

1.25 

0.6 

10 

-1.94 

1.62 

-0.11 

0.9 

20 

-1.86 

1.83 

-0.09 

0.9 

30 

-1.84 

1.23 

-0.17 

0.9 

40 

-1.43 

1.73 

-0.10 

0.8 

Note.  N  =  40  items  in  each  condition. 


Form  A  conditions  are  presented  in  Table  12  and  Form  B  information  is  given  in 
Table  13.  The  item  parameter  values  calculated  from  the  noncompensatory 
transformations  are  within  the  ranges  given  by  Ackerman  (1989).  For  all 


80 


Table  1 1 

Descriptive  Statistics  for  Compensatory  Form  B  Item  Parameters 


Parameter     Condition       Minimum        Maximum 


Mean 


SD 


a1 

10 

0.37 

3.65 

1.14 

0.8 

20 

0.27 

2.41 

0.96 

0.5 

30 

0.15 

2.11 

0.94 

0.5 

40 

0.27 

2.45 

0.88 

0.5 

a2 

10 

0.00 

1.37 

0.20 

0.4 

20 

0.00 

1.58 

0.36 

0.5 

30 

0.00 

2.11 

0.55 

0.5 

40 

0.18 

2.27 

0.71 

0.4 

d 

10 

-2.55 

6.23 

0.30 

1.5 

20 

-1.87 

2.76 

0.00 

1.1 

30 

-3.30 

4.65 

0.10 

1.3 

40 

-2.90 

2.78 

0.20 

1.2 

MDISC 

10 

0.39 

3.65 

1.23 

0.8 

20 

0.32 

2.41 

1.12 

0.5 

30 

0.30 

2.98 

1.16 

0.6 

40 

0.42 

2.62 

1.18 

0.5 

MID 

10 

-1.71 

1.96 

-0.13 

0.9 

20 

-1.88 

1.58 

0.08 

0.9 

30 

-1.68 

1.77 

-0.02 

0.9 

40 

-1.79 

1.94 

-0.09 

0.8 

Note.  N  =  40  items  in  each  condition. 


conditions  and  in  both  forms,  b2  is  slightly  less  difficult  than  bi,  and  a2  is  less 
discriminating  than  ai. 

In  all  cases,  the  noncompensatory  bi  parameters  are  lower  than  the  MID; 
for  the  corresponding  item.  This  may  be  explained  by  considering  the  method 


81 
Table  12 

Descriptive  Statistics  for  Multidimensional  Item  Parameters  in  Noncompensatory 
Form  A 

Parameter     Condition       Minimum        Maximum         Mean  SD 

a1 


a2 


b1 


b2 


10 

0.27 

0.99 

0.63 

0.3 

20 

0.10 

1.10 

0.60 

0.3 

30 

0.10 

2.26 

0.63 

0.4 

40 

0.33 

2.65 

0.76 

0.4 

10 

0.00 

1.22 

0.17 

0.3 

20 

0.15 

0.94 

0.52 

0.2 

30 

0.00 

1.53 

0.49 

0.4 

40 

0.32 

1.14 

0.62 

0.2 

10 

-3.44 

0.86 

-1.20 

1.1 

20 

-2.94 

1.68 

-0.79 

1.2 

30 

-4.96 

1.07 

-1.07 

1.4 

40 

-3.55 

1.93 

-0.92 

1.2 

10 

-3.01 

-0.54 

-1.43 

0.8 

20 

-5.75 

2.34 

-1.29 

2.0 

30 

-4.87 

0.84 

-1.45 

1.4 

40 

-3.10 

3.98 

-1.12 

1.2 

Note.  The  number  of  multidimensional  items  is  the  same  as  the  condition  number. 

used  to  calculate  the  transformations.  A  compensatory  and  a  noncompensatory 
item  were  considered  corresponding  if,  for  each  61,62  combination,  the 
probability  of  a  correct  response  was  the  same  on  both  items.  Because  the 
noncompensatory  model  does  not  allow  a  high  ability  on  one  trait  to  compensate 
for  a  low  ability  on  the  other  dimension,  the  bi  parameters  on  a 


82 

Table  13 

Descriptive  Statistics  for  Multidimensional  Item  Parameters  in  Noncompensatory 
Form  B 

Parameter     Condition       Minimum        Maximum         Mean  SD 

a1 


a2 


b1 


b2 


10 

0.33 

1.54 

0.69 

0.4 

20 

0.13 

1.10 

0.59 

0.3 

30 

0.33 

1.33 

0.69 

0.3 

40 

0.10 

1.58 

0.64 

0.3 

10 

0.29 

0.97 

0.64 

0.3 

20 

0.10 

1.12 

0.57 

0.3 

30 

0.16 

0.83 

0.63 

0.4 

40 

0.25 

1.82 

0.64 

0.3 

10 

-2.32 

0.57 

-0.96 

0.8 

20 

-3.22 

0.27 

-1.21 

0.9 

30 

-3.30 

1.65 

0.10 

1.3 

40 

-4.06 

1.86 

-0.79 

1.1 

10 

-3.04 

0.33 

-1.25 

1.0 

20 

-4.90 

0.69 

-1.53 

1.5 

30 

-3.62 

1.03 

-1.24 

1.3 

40 

-3.51 

0.82 

-1.32 

1.1 

Note.  The  number  of  multidimensional  items  is  the  same  as  the  condition  number. 

noncompensatory  item  must  be  smaller  than  the  MID]  parameterof  the 
compensatory  item  if  the  condition  for  items  to  be  corresponding  is  to  be  met. 

The  differences  between  the  compensatory  and  noncompensatory  M2PL 
models  can  also  be  shown  graphically.  Because  the  probability  of  a  correct 
response  varies  as  a  function  of  the  9  in  each  model,  the  item  response 
surfaces  (IRS)  and  contour  plots  of  matched  items  should  differ.  The 


83 

compensatory  and  corresponding  noncompensatory  model  IRS  and  contour  plot 
for  an  item  of  each  degree  of  dimensionality  are  shown  in  Figures  4  through  7. 
In  Figure  4,  a  matched  item  that  discriminates  predominantly  on  9i  (a  ■  20°) 
is  pictured.  The  differences  between  the  two  IRSs  are  minor.  A  similarity  also 
exists  in  the  two  conditions  where  the  degree  of  dimensionality  is  15°  from 
equally  discriminating.  Figure  5  shows  the  IRS  for  a  =  30° ,  which  discriminates 
slightly  more  on  9i  than  on  92.  Conversely,  Figure  7  presents  the  graphs  for  a  = 
60°,  which  discriminates  slightly  more  on  82  than  on  81.  Although  differences 
exist  in  the  baselines,  the  curves  of  the  IRSs  remain  similar.  This  is  true  both 
within  each  of  the  two  matched  sets  and  between  the  items  with  a  =  30°  and 
a=60°.  In  Figure  6,  where  a  =  45°,  the  corresponding  compensatory  and 
noncompensatory  items  discriminate  equally  along  81  and  92,  and  there  is  a 
sharp  contrast  between  corresponding  curves. 

Similar  conclusions  can  be  drawn  from  examination  of  the  equiprobability 
lines  of  the  contour  plots.  For  the  compensatory  model,  parallel  lines  join  the 
61,82  combinations  that  have  an  equal  probability  of  a  correct  response.  The 
incline  of  these  lines  is  a  function  of  the  discrimination  parameters.  However, 
because  the  noncompensatory  model  does  not  allow  a  high  ability  on  one 
dimension  to  compensate  for  a  low  ability  on  another  dimension,  the  lines 
connecting  the  9,  ,92  combinations  are  curvilinear.    The  direction  of  these  lines 
in  the  noncompensatory  model  is  a  function  of  the  item's  difficulty  parameters 


84 


(a)  Compensatory  IRS 
(a,=.732,  a2=266,  d=-.104) 


(b)  Noncompensatory  IRS 
(a,=.526,  a2=.378,  b,=-.595,  b2=-2.961) 


(c)  Compensatory  Contour  Plot 


(d)  Noncompensatory  Contour  Plot 


Figure  3.  Item  response  surfaces  and  contour  plots  for  item  9,  MD20,  a=20° 


85 


(a)  Compensatory  IRS 
(a,=.934,  a2=.539,  d=.650) 


(b)  Noncompensatory  IRS 
(a,=. 709,  a2=.526,  b,=-.092,  b3=-1.177) 


(c)  Compensatory  Contour  Plot 


(d)  Noncompensatory  Contour  Plot 


Figure  4.  Item  response  surfaces  and  contour  plots  for  item  10,  MD20,  a=30° 


86 


(a)  Compensatory  IRS 
(a,=1.223,  a2=1.223,  d=1.994) 


(b)  Noncompensatory  IRS 
(a,=.970,  a2=.933,  b,=-.951,  b2=-.913) 


(c)  Compensatory  Contour  Plot 


(d)  Noncompensatory  Contour  Plot 


Figure  5.  Item  response  surfaces  and  contour  plots  for  item  11,  MD20,  a=45° 


87 


(a)  Compensatory  IRS 
(a,=.516,  a2=894,  d=.011) 


(b)  Noncompensatory  IRS 
(a,=.542,  a2=.664,  b,  =-1.764,  b2=-.538) 


(c)  Compensatory  Contour  Plot 


(d)  Noncompensatory  Contour  Plot 


Figure  6.  Item  response  surfaces  and  contour  plots  for  item  12,  MD20,  a=60° 


88 
(Ackerman,  1989).  Examination  of  Figures  4  through  7  indicates  that,  like  the 
IRS,  as  a  multidimensional  noncompensatory  item  approaches  equal 
discrimination  on  the  two  abilities,  the  curves  of  the  equiprobability  lines 
increase  greatly. 
Analytical  Estimation 

Unidimensional  estimates  of  the  multidimensional  item  parameters  for 
both  compensatory  and  noncompensatory  conditions  were  calculated  using  the 
procedure  described  by  Wang  (1986).  Although  this  procedure  was  developed 
for  the  compensatory  model,  the  purpose  of  this  research  merited  its  use  with 
noncompensatory  data  also.  Descriptive  statistics  are  presented  in  Table  14  for 
Form  A  and  Table  15  for  Form  B.  For  the  compensatory  conditions,  the 
analytical  difficulty  estimates  approximate  the  simulated  MID  values.  However, 
variation  is  found  in  the  discrimination  parameters.  This  occurs  because  the 
analytical  solution  projects  the  discriminations  onto  a  reference  composite 
vector.  This  same  pattern  is  repeated  for  the  noncompensatory  conditions. 
Simulated  Ability  Data 

For  each  condition,  1,000  examinees  were  simulated  from  a  normal 
distribution  with  mean  0.0  and  standard  deviation  1.0  on  both  9i  and  82.  The 
two  thetas  were  assumed  to  be  uncorrelated.  The  thetas  were  then  applied  in 
the  appropriate  compensatory  or  noncompensatory  probability  equation  and 
responses  were  generated.  Twenty  response  sets  were  generated  for  each 
condition.  Inspection  of  the  descriptive  statistics  contained  in  Tables  16 


89 


Table  14 

Descriptive  Statistics  for  Analytical  Unidimensional  Estimates  of  Form  A  Item 
Parameters 


Parameter     Condition       Minimum        Maximum 


Mean 


SD 


Compensatory  Model 


10 

0.18 

2.01 

0.67 

0.5 

20 

0.16 

1.13 

0.55 

0.2 

30 

0.15 

1.32 

0.55 

0.3 

40 

0.30 

1.35 

0.69 

0.3 

10 

-1.95 

2.63 

-0.11 

1.0 

20 

-2.05 

2.02 

-0.09 

1.0 

30 

-1.91 

1.41 

-0.16 

0.9 

40 

-1.49 

1.85 

-0.11 

0.9 

Noncompensatory  Model 


10 

0.22 

2.01 

0.68 

0.4 

20 

0.09 

1.10 

0.48 

0.2 

30 

0.12 

0.93 

0.47 

0.2 

40 

0.30 

1.11 

0.57 

0.2 

10 

-4.04 

1.63 

-0.69 

1.3 

20 

-4.86 

2.02 

-0.75 

1.4 

30 

-6.02 

1.45 

-1.29 

1.6 

40 

-3.87 

2.21 

-1.41 

1.2 

Note.  N  =  40  in  all  conditions. 


through  19  verify  the  success  of  the  data  generation.  Most  ability  values  are 
between  -3.00  and  +3.00  with  the  mean  and  standard  deviation  specified.  The 
correlation  between  8,  and  62  is  approximately  zero  in  all  conditions. 


Table  15 

Summary  Statistics  for  Analytical  Unidimensional  Estimates  of  Form  B  Item 
Parameters 


Parameter     Condition       Minimum 


Maximum 


Mean 


SD 


90 


Compensatory  Parameters 


10 

0.23 

2.10 

0.66 

0.4 

20 

0.17 

1.13 

0.57 

0.2 

30 

0.15 

1.52 

0.62 

0.3 

40 

0.24 

1.35 

0.64 

0.3 

10 

-1.71 

1.96 

-0.14 

1.0 

20 

-2.30 

1.93 

0.07 

1.0 

30 

-1.74 

2.03 

-0.01 

1.0 

40 

-1.86 

2.12 

-0.10 

0.9 

Noncompensatory  Parameters 


10 

0.23 

2.10 

0.68 

0.4 

20 

0.10 

2.10 

0.56 

0.3 

30 

0.19 

1.10 

0.52 

0.2 

40 

0.14 

0.96 

0.52 

0.2 

10 

-3.08 

1.96 

-0.47 

1.1 

20 

-3.97 

1.80 

-1.07 

1.4 

30 

-3.49 

1.45 

-1.15 

1.3 

40 

-4.53 

1.18 

-1.45 

1.2 

Note.  N  =  40  in  all  conditions. 


Additional  simulee  sets  were  generated  to  represent  low  ability 
examinees  for  all  conditions.  Five  replications  of  each  response  set  were 
generated.  The  summary  statistics  for  these  data  are  contained  in  Table  20. 
For  9i,  the  values  ranged  from  approximately  -3.5  to  0.0  with  all  means  around 


91 


Table  16 

Descriptive  Statistics  for  Simulated  Examinees  Taking  MD10 


THETA  1 


THETA  2 


Rep        Low       High       Mean     SD 


Low        High      Mean       SD 


1 

-2.89 

3.02 

0.07 

0.96 

-3.80 

3.13 

-0.01 

1.01 

2 

-3.19 

3.15 

0.04 

1.00 

-2.94 

3.03 

-0.03 

1.02 

3 

-3.62 

2.79 

-0.06 

1.00 

-3.02 

2.62 

0.02 

1.00 

4 

-2.87 

3.33 

-0.01 

0.99 

-3.26 

2.98 

0.05 

1.03 

5 

-2.98 

3.65 

0.03 

0.99 

-3.05 

3.16 

-0.01 

0.99 

6 

-3.13 

2.75 

-0.02 

1.00 

-3.05 

3.56 

0.01 

0.99 

7 

-2.98 

3.63 

0.01 

0.98 

-3.46 

2.77 

-0.01 

0.97 

8 

-2.97 

3.19 

0.01 

1.03 

-3.42 

4.03 

0.03 

0.98 

9 

-3.76 

3.24 

-0.02 

0.97 

-2.96 

3.15 

0.00 

0.99 

10 

-4.02 

2.96 

-0.01 

1.02 

-2.68 

3.33 

0.00 

1.03 

11 

-3.97 

3.28 

-0.02 

1.03 

-3.11 

3.58 

-0.03 

1.03 

12 

-3.29 

3.05 

-0.06 

1.02 

-3.28 

3.23 

0.01 

1.01 

13 

-3.11 

3.57 

-0.01 

0.96 

-3.38 

4.00 

-0.01 

1.06 

14 

-2.86 

3.07 

0.03 

0.97 

-3.20 

3.15 

-0.02 

1.03 

15 

-3.91 

3.39 

-0.01 

0.99 

-3.43 

3.22 

-0.01 

1.03 

16 

-3.53 

2.94 

0.01 

0.99 

-4.28 

3.08 

-0.03 

1.01 

17 

-3.18 

2.76 

0.01 

0.97 

-3.32 

3.01 

0.03 

1.00 

18 

-4.30 

3.71 

-0.01 

1.03 

-3.62 

3.26 

-0.02 

1.02 

19 

-3.61 

2.80 

-0.02 

0.99 

-3.27 

3.35 

0.00 

1.00 

20 

-2.84 

2.97 

0.03 

1.02 

-2.97 

3.18 

0.00 

1  00 

Note.  N  =  1 000  for  each  replication 


-0.8  and  standard  deviations  of  0.6.  The  92  values  were  normally  distributed 
and  ranged  from  -3.00  to  +3.00  with  mean  0.0  and  standard  deviation  1.  This 
would  be  expected  with  uncorrected  theta  abilities. 


92 


Table  17 

Descriptive  Statistics  for  Simulated  Examinees  Taking  MD20 


THETA  1 

THETA  2 

Rep 

Low 

High 

Mean 

SD 

Low 

High 

Mean 

SD 

1 

-2.96 

3.50 

0.00 

0.98 

-3.30 

3.00 

-0.05 

0.99 

2 

-3.30 

3.58 

0.02 

1.02 

-2.97 

3.06 

0.02 

1.01 

3 

-3.18 

3.91 

0.02 

0.98 

-2.98 

3.73 

0.02 

1.03 

4 

-3.38 

3.24 

-0.02 

1.00 

-3.05 

3.64 

-0.02 

0.99 

5 

-3.35 

3.42 

-0.05 

1.01 

-3.27 

2.98 

0.01 

1.01 

6 

-3.07 

3.22 

-0.04 

1.01 

-2.77 

3.89 

0.01 

0.96 

7 

-4.34 

3.34 

-0.02 

1.00 

-4.05 

3.17 

0.01 

1.03 

8 

-3.14 

3.28 

0.01 

0.99 

-3.15 

3.16 

0.05 

1.01 

9 

-3.09 

3.30 

-0.02 

0.98 

-2.50 

3.72 

-0.03 

1.00 

10 

-3.13 

3.50 

0.00 

0.98 

-4.55 

2.89 

-0.02 

1.00 

11 

-3.12 

2.80 

0.01 

0.97 

-3.32 

3.52 

-0.05 

0.98 

12 

-2.88 

3.48 

0.04 

1.01 

-2.95 

2.98 

0.00 

1.00 

13 

-3.33 

2.97 

-0.01 

1.01 

-3.47 

3.86 

0.01 

1.05 

14 

-3.20 

3.75 

-0.05 

0.98 

-3.08 

2.73 

-0.01 

0.98 

15 

-3.96 

2.93 

0.04 

0.99 

-3.49 

3.31 

-0.01 

1.02 

16 

-2.84 

3.71 

0.00 

1.00 

-2.97 

3.27 

0.00 

0.97 

17 

-3.17 

3.54 

-0.01 

0.98 

-3.71 

3.92 

-0.04 

1.04 

18 

-2.98 

2.94 

0.01 

0.97 

-3.54 

3.37 

-0.04 

0.97 

19 

-3.62 

3.11 

-0.04 

0.98 

-3.17 

3.22 

-0.02 

1.03 

20 

-2.84 

3.44 

0.00 

0.97 

-3.11 

3.47 

-0.04 

0.97 

Note.  N  =  1000  for  each  replication 


Equating  Results  for  Randomly  Equivalent  Groups 
Concurrent  Calibration 

Response  data  for  all  equivalent  groups  were  used  by  BILOG386  to  calibrate 
the  item  and  ability  parameters  for  both  Forms  A  and  B  simultaneously. 


93 


Table  18 

Descriptive  Statistics  for  Simulated  Examinees  Taking  MD30 


THETA  1 


THETA  2 


Rep        Low       High       Mean     SD 


Low       High      Mean       SD 


1 

-3.79 

3.40 

-0.06 

1.03 

-3.23 

2.81 

-0.03 

0.99 

2 

-3.40 

3.00 

0.00 

1.02 

-3.34 

3.88 

-0.06 

0.99 

3 

-3.28 

3.15 

-0.02 

1.04 

-3.07 

2.93 

-0.02 

0.99 

4 

-3.07 

3.09 

0.02 

1.01 

-3.14 

3.48 

0.04 

1.00 

5 

-3.76 

3.32 

-0.04 

1.02 

-3.08 

3.29 

0.02 

1.01 

6 

-2.82 

4.18 

-0.01 

1.01 

-2.85 

3.03 

0.08 

0.98 

7 

-3.35 

3.13 

0.03 

0.97 

-3.55 

3.12 

0.02 

0.99 

8 

-2.51 

3.54 

-0.02 

1.02 

-3.11 

3.24 

0.03 

0.98 

9 

-3.60 

4.23 

-0.04 

1.03 

-3.06 

3.38 

0.03 

1.05 

10 

-3.08 

2.84 

0.01 

0.96 

-2.96 

3.03 

-0.02 

1.03 

11 

-2.70 

3.21 

0.04 

1.00 

-3.04 

3.28 

-0.01 

0.99 

12 

-3.04 

3.25 

0.01 

0.99 

-3.05 

3.19 

0.07 

0.99 

13 

-3.26 

3.91 

-0.02 

1.02 

-4.00 

3.22 

0.04 

0.98 

14 

-3.09 

4.19 

0.03 

1.01 

-3.04 

3.15 

0.06 

0.98 

15 

-3.22 

2.46 

0.02 

0.98 

-3.54 

3.84 

-0.01 

1.01 

16 

-3.84 

3.41 

-0.02 

1.00 

-3.16 

2.97 

-0.01 

0.97 

17 

-3.26 

3.08 

0.03 

0.98 

-3.14 

2.83 

0.01 

0.98 

18 

-2.67 

3.26 

-0.01 

1.01 

-3.47 

3.06 

0.02 

1.02 

19 

-4.00 

3.15 

0.01 

1.07 

-3.73 

2.92 

-0.05 

1.05 

20 

-3.23 

3.37 

0.01 

1.04 

-2.81 

2.91 

-0.05 

0.98 

Note.  N  =  1 000  for  each  replication 


Twelve  common  items  linked  each  form  resulting  in  parameter  estimates  on  the 
same  scale.  The  equated  ability  estimates  were  then  compared  with  the  three 
comparison  conditions-the  simulated  6i,  the  average  of  the  simulated  9i  and  92, 
and  the  analytical  unidimensional  estimates.  Correlations  (p),  standardized 
differences  between  means  (SDM),  and  standardized  root  mean  squared 


94 


Table  19 

Descriptive  Statistics  for  Simulated  Examinees  Taking  MD40 


THETA  1 

THETA  2 

Rep 

Low 

High 

Mean 

SO 

Low 

High 

Mean 

SD 

1 

-2.85 

3.79 

0.02 

0.96 

-3.26 

2.73 

-0.04 

0.98 

2 

-3.21 

2.99 

0.00 

1.02 

-3.57 

3.18 

0.00 

1.00 

3 

-3.40 

2.87 

-0.03 

1.03 

-3.73 

3.47 

0.03 

0.97 

4 

-3.30 

3.17 

0.01 

1.03 

-2.89 

3.18 

0.03 

1.02 

5 

-3.12 

3.59 

-0.01 

1.03 

-2.78 

2.81 

-0.01 

0.97 

6 

-3.58 

3.54 

-0.02 

1.00 

-3.27 

3.14 

0.05 

1.01 

7 

-3.09 

3.26 

-0.04 

1.00 

-3.09 

3.13 

0.01 

1.05 

8 

-3.40 

2.75 

-0.02 

1.01 

-2.81 

3.37 

-0.04 

1.01 

9 

-3.60 

4.23 

-0.04 

1.03 

-3.06 

3.38 

0.03 

1.05 

10 

-3.51 

3.69 

-0.05 

1.02 

-2.77 

2.87 

-0.01 

0.97 

11 

-3.24 

2.83 

0.03 

1.05 

-3.45 

2.74 

-0.01 

1.00 

12 

-3.81 

2.66 

-0.04 

1.02 

-2.99 

3.28 

0.00 

0.98 

13 

-3.10 

2.77 

-0.02 

1.02 

-2.91 

2.56 

0.00 

1.00 

14 

-3.05 

3.44 

-0.03 

1.00 

-3.27 

3.29 

0.00 

1.02 

15 

-3.44 

3.13 

0.04 

1.02 

-3.82 

3.43 

0.00 

1.01 

16 

-3.03 

3.37 

-0.02 

0.98 

-3.59 

2.88 

-0.03 

0.99 

17 

-3.68 

3.63 

0.02 

1.03 

-3.51 

3.96 

0.02 

0.97 

18 

-2.73 

3.17 

0.01 

0.98 

-2.83 

3.27 

0.01 

1.01 

19 

-3.59 

3.37 

0.02 

1.03 

-2.99 

3.35 

0.01 

0.96 

20 

-2.69 

2.59 

-0.02 

0.95 

-3.29 

3.51 

-0.01 

1.04 

Note.  N  =  1000  for  each  replication 


difference  (SRMSD)  statistics  were  calculated  for  each  form.  Table  21  presents 
results  summarized  across  the  20  replications  for  compensatory  data.  For  both 
Forms  A  and  B,  the  strength  of  the  correlation  between  the  estimated  6s  and  the 
simulated  e,s  decreases  as  the  number  of  multidimensional  items  in  the  test 


Table  20 

Descriptive  Statistics  for  Simulated  Low  Ability  Examinees 


THETA  1 


THETA  2 


Rep        Low       High       Mean     SD 


95 


Low       High      Mean       SD 


MD10 


1 

-3.19 

0.09 

-0.73 

0.60 

-3.80 

3.13 

-0.05 

1.02 

2 

-3.62 

-0.03 

-0.82 

0.58 

-3.19 

2.98 

0.01 

1.04 

3 

-3.13 

-0.02 

-0.79 

0.58 

-2.91 

3.16 

0.02 

0.98 

4 

-2.98 

0.04 

-0.80 

0.59 

-3.46 

4.03 

0.02 

1.00 

5 

-4.02 

-0.02 

-0.81 

0.61 

-2.96 

3.15 

-0.00 

1.02 

MD20 


1 

-3.13 

0.00 

-0.78 

0.61 

-3.30 

3.00 

-0.06 

0.96 

2 

-3.30 

0.02 

-0.80 

0.62 

-2.92 

3.42 

-0.02 

1.01 

3 

-3.18 

0.00 

-0.77 

0.59 

-3.15 

3.73 

0.04 

1.02 

4 

-4.34 

-0.02 

-0.82 

0.60 

-4.05 

2.87 

-0.04 

1.00 

5 

-3.35 

-0.04 

-0.86 

0.60 
MD30 

-2.83 

2.98 

0.02 

1.00 

1 

-3.79 

0.01 

-0.84 

0.64 

-3.24 

3.88 

-0.05 

1.02 

2 

-3.28 

-0.01 

-0.82 

0.62 

-3.14 

3.48 

0.01 

1.01 

3 

-3.76 

-0.03 

-0.82 

0.64 

-3.08 

3.29 

0.06 

0.98 

4 

-3.35 

0.01 

-0.79 

0.58 

-3.55 

3.24 

0.05 

1.00 

5 

-3.60 

0.02 

-0.80 

0.60 
MD40 

-3.01 

3.03 

-0.02 

1.02 

1 

-3.10 

-0.02 

-0.80 

0.59 

-2.85 

2.13 

-0.01 

1.02 

2 

-3.44 

0.02 

-0.78 

0.60 

-2.97 

2.75 

0.01 

1.03 

3 

-3.51 

0.01 

-0.81 

0.62 

-3.82 

2.27 

-0.02 

0.98 

4 

-3.24 

0.01 

-0.80 

0.61 

-3.47 

2.38 

-0.04 

1.00 

5 

-3.81 

0.02 

-0.83 

0.62 

-2.99 

2.91 

0.04 

0.96 

Note.  N  =  1000  for  each  replication 


96 


< 


Crrfi 


o  o  o  o 


c/j  GC 


HI 

< 


<M 


O 


a: 


0) 


o 
o 

o 

£ 
ra 

£ 
E 


LU 
< 


I  o  o  o 


o  o  o  o 


o  o  o  o 


o  o  o  o 


t  rfK'^'a? 


Q.O  O  O  O 

oDDQQ 
OSS  S  2 


o  o  o  o 


CM  (0  CO  CM 

q  q  q  q 

CO  CO  CM  CM 

q  q  a  q 

CM  CM  CO  CO 
O  O  O  O 

CO  CO  CM  CM 

q  q  q  q 

r-  in  m  ■»3- 

rC'ino'to 
i^  ir>  m  *? 

o'to  cTco 
co  to  ir>  m 

ay  to  ST  co 
h-  to  m  m 

d  d  d  d 

d  d  d  d 

oood 

d  d  d  d 

*-  CO  CO  CO 

oqqq 

*~  CM  CM*  CM 

q  q  q  q 

P"c\i"CM  CO" 

q  q  q  q 

CM*  CM  CM  co" 

q  q  q  q 

^00  CO   (D 

to  m  (0  (5 

^oi'co'f^ 
co  m  to  to 

co  in  to  i^- 

"n  ~"t  o""^ 
co  in  to  r^ 

d  d  d  d 

dodo 

d  d  d  d 

oood 

CM  CO  CO  CO 

q  q  q  q 

CM  CO  Tf  CO 

oqqq 

CO  CM  CM  CO 

oqqq 

CM  CM  CO  CM 

oqqq 

^^"sT  CO 

o  o  o  o 

cm'"?1' in  co 
o  o  o  o 

"^CMCM  "$ 

o  o  o  o 

o  o  o  o 

dodo 

dodo 

oood 

d  d  d  d 

co"  to  to"  CM 
q  q  q  q 

co'co'^'to 
q  q  q  q 

^5  CO"  CM 

oqqq 

CO  co^co 
oqqq 

o  o  o  o 

o  o  o  o 

o^^o 
o  o  o  o 

o^cfo 
o  o  S  o 

o  d  d  q 

dodo' 

dodo 

dodo 

^■"io"in  cm" 
o  q  q  q 

?co"co  ?r" 
oqqq 

99  to  co 

qoqq 

5"  to"  to? 

oqqq 

o  o  o  o 

o  o  o  o 

Q^o"^ 

o  o  O  O 

O^Q  o 

o  o  5  o 

o  q  o  o 


o  q  o  q 


r>~  co  d)  ai 
d  d  d  d 


t  ^  CO  t-  00 

d  d  d  d 


Q.O  O  O  O 
C   i-   CM  CO  -<J- 

O  Q  q  a  Q 


oqqqq 
-  jf  to  cm  irT 

£<0">  CO  CO  N 

o  d  d  d  d 


g  o  o  o  o 
y  t-  CM  co  -t 
o  Q  O  O  Q 


o  o  o  o 


o  o  o  o 


£   ■<-  t-   t-  CM 

oqqqq 

.  "jf  to"  co  in 
>-o>  to  co  r*~ 
Q  d  d  d  o 


1 


X  o  o  o  o 

oQQDO 


P  to 
S  E 
■o  ■.= 

C    M 

W  LU 


CO  "■ 
CO     II 

5  to 


97 
increases.  Conversely,  the  strength  of  the  correlation  between  the  estimated  Gs 
and  the  average  of  the  simulated  9is  and  92s  increases  as  the  number  of 
multidimensional  items  increases.  The  same  pattern  of  decreasing  desirability  is 
repeated  in  the  SRMSD  values.  This  agrees  with  earlier  findings  that  as 
multidimensionality  in  a  test  becomes  more  predominant,  the  abilities  should  be 
viewed  as  a  composite  of  the  underlying  traits  (Ackerman,  1989;  Way,  et  al., 
1988;  Yen,  1984).  The  average  of  the  abilities  seems  to  recover  this  composite 
more  accurately  than  simply  using  the  dominant  first  dimension  to  define  the 
multidimensional  data.  There  is  virtually  no  difference  in  the  results  for  Forms  A 
and  B. 

When  examining  the  results  of  comparisons  with  the  analytical 
estimations,  outcomes  are  quite  different.  For  all  conditions,  correlations  are 
similar  and  very  high  at  .97  or  .98.  Similarly,  the  SRMSD  values  are 
approximately  equal  across  conditions  and  are  relatively  low.  Although  the  data 
were  simulated  in  a  manner  that  conforms  to  the  theory  behind  the  analytical 
approximation  procedure,  differences  in  the  amount  of  multidimensionality  may 
have  had  an  effect  on  analytical  estimates.  For  concurrent  calibration,  the 
analytical  estimation  procedure  appears  to  describe  the  unidimensional  equating 
with  multidimensional  data  well,  and  it  is  not  affected  by  the  amount  of 
dimensionality  in  the  test.  Results  are  almost  identical  for  both  forms. 

The  application  of  the  SDM  statistic  provided  comparable  results  across 
all  conditions  and  baselines.  All  values  are  close  to  0.0  denoting  close 


98 

agreement  of  the  means  of  the  two  forms.  Negative  numbers  indicate  the 
estimated  Os  were  slightly  higher  than  those  in  the  baseline  conditions.  No 
discernible  pattern  emerges  from  examination  of  this  data. 

Results  calculated  from  the  noncompensatory  conditions  display  similar 
trends.  Strengths  of  correlations  decrease  and  SRMSDs  increase  as  the 
number  of  multidimensional  items  increase.  The  analytical  estimation  results 
are  approximately  equal  and  high  for  all  conditions.  Like  the  compensatory 
outcomes,  the  SDM  statistic  is  around  0.0  for  all  conditions.  No  differences  are 
seen  between  forms. 

Comparisons  between  the  compensatory  and  noncompensatory  models 
reveal  some  variations.  For  MD10,  the  correlations  with  61  are  equal  for  both 
models.  For  MD20  and  MD30,  the  correlation  is  slightly  higher  for  the 
noncompensatory  model  than  for  the  compensatory  model,  but  is  slightly  lower 
for  MD40.  However,  for  all  tests,  the  correlations  with  (81  +  82)/2  are  slightly 
lower  with  the  noncompensatory  model  than  with  the  compensatory  model.  The 
greatest  difference  is  found  in  MD10  with  .88  for  the  compensatory  model  and 
.80  for  the  noncompensatory  model.  The  correlations  with  analytical  estimations 
are  equal  for  comparable  conditions  in  both  models. 

The  SRMSD  results  show  similar  variations  but  in  opposite  directions. 
The  SDM  is  approximately  equal  for  corresponding  conditions  in  both  the 
compensatory  and  noncompensatory  conditions. 


99 
Equated  bs 

The  means  and  standard  deviations  of  the  bs  for  the  twelve  linking  items 
on  each  form  were  calculated.  These  were  used  in  equations  20  and  21  in 
Chapter  3  to  derive  the  slope  and  intercept  constants  presented  in  Tables  22 
and  23.  Their  values  were  similar  with  slopes  not  deviating  far  from  1 .00  and 
intercepts  close  to  0.0.  The  linear  transformations  were  applied  to  the  Form  B 
estimated  abilities  and  comparisons  to  the  Form  A  baselines  were  made. 
Because  a  new  potential  source  of  error  was  being  introduced  with  the  equating 
constants,  results  for  this  procedure  were  expected  to  be  less  precise  than 
those  produced  by  concurrent  calibration. 

Results  for  the  equated  bs  equatings  for  equivalent  groups  are  shown  in 
Table  24.  The  outcomes  of  this  procedure  generated  criteria  patterns  similar  to 
those  found  with  concurrent  calibration.  As  the  number  of  multidimensional 
items  in  a  test  form  increases,  the  strength  of  the  correlation  of  equated  ability 
estimates  with  9(  decreases  and  their  relationship  with  the  average  abilities 
increases.  The  correlations  of  the  analytical  estimations  with  the  equated  bs 
data  are  similar  for  all  conditions  and  are  very  high  (pAE.  a  =  .97  or  .98).  The 
SRMSD  statistic  also  produces  patterns  like  those  found  for  concurrent 
calibration,  increasing  in  comparisons  with  81  as  multidimensionality  increases, 
and  decreasing  in  comparison  with  the  9avg  comparison  condition.  Correlations 
are  approximately  equal  across  conditions  when  the  analytical  estimations  are 
used  as  the  comparison. 


100 

Table  22 

Constants  for  Equated  bs  Equating  of  Compensatory  Forms  with  Randomly  Equivalent 
Groups 

MD10  MD20  MD30  MD40 

Rep      Slope  Intercept        Slope   Intercept        Slope  Intercept      Slope  Intercept 


1 

1.04 

-0.08 

1.07 

0.06 

0.93 

0.03 

1.08 

0.07 

2 

0.94 

-0.04 

0.94 

-0.04 

0.94 

-0.04 

0.94 

-0.04 

3 

1.07 

0.09 

1.14 

0.06 

1.03 

0.06 

1.06 

0.04 

4 

0.90 

-0.01 

1.07 

0.09 

1.04 

-0.15 

1.02 

-0.06 

5 

1.13 

-0.01 

0.98 

-0.07 

0.96 

0.13 

0.96 

0.08 

6 

0.94 

0.04 

1.00 

0.05 

0.88 

-0.06 

1.02 

-0.14 

7 

1.09 

0.00 

1.03 

0.09 

1.06 

-0.06 

1.02 

0.04 

8 

0.90 

-0.06 

1.07 

0.03 

1.07 

0.00 

0.94 

0.04 

9 

1.09 

-0.02 

1.06 

0.03 

0.96 

0.04 

0.99 

-0.13 

10 

1.02 

0.02 

0.91 

-0.04 

0.99 

0.05 

1.01 

0.08 

11 

1.04 

-0.01 

1.12 

0.12 

0.91 

-0.10 

1.04 

-0.03 

12 

0.98 

0.06 

1.09 

-0.04 

1.03 

0.03 

1.02 

0.01 

13 

1.01 

0.00 

0.93 

-0.03 

0.95 

0.09 

1.02 

0.00 

14 

1.01 

-0.04 

1.08 

0.09 

1.00 

-0.12 

1.10 

0.10 

15 

1.04 

0.02 

0.94 

0.06 

1.04 

-0.02 

0.91 

-0.09 

16 

1.01 

0.09 

1.05 

-0.02 

0.97 

0.02 

1.06 

0.04 

17 

1.05 

-0.04 

1.00 

0.00 

1.03 

0.02 

1.04 

0.01 

18 

0.90 

-0.01 

0.94 

-0.45 

0.98 

-0.03 

0.98 

0.04 

19 

1.17 

0.06 

0.83 

0.02 

1.03 

0.00 

1.08 

-0.07 

20 

1.00 

0.05 

1.19 

0.09 

1.02 

-0.14 

0.93 

0.05 

In  comparing  the  results  of  the  equated  bs  procedure  with  those  of 
concurrent  calibration,  some  small  differences  can  be  noted  but  they  do  not  form 
a  consistent  pattern.  In  general,  the  results  are  similar  to  those  found  for 
concurrent  calibration.  The  results  are  also  similar  for  compensatory  and 
noncompensatory  data. 


101 

Table  23 

Constants  for  Equated  bs  Equating  of  Noncompensatory  Forms  with  Randomly 
Equivalent  Groups 

MD10  MD20  MD30  MD40 

Rep      Slope   Intercept        Slope   Intercept        Slope   Intercept      Slope  Intercept 


1 

1.04 

0.03 

1.01 

-0.04 

0.93 

-0.01 

0.89 

0.05 

2 

0.94 

-0.04 

0.94 

-0.04 

0.94 

-0.04 

0.94 

-0.04 

3 

0.94 

-0.04 

1.12 

-0.12 

0.93 

-0.04 

1.08 

0.07 

4 

1.05 

0.07 

0.88 

-0.01 

0.92 

-0.05 

0.97 

0.02 

5 

1.00 

0.03 

0.99 

-0.02 

1.05 

0.07 

0.91 

0.05 

6 

0.98 

-0.07 

1.01 

0.01 

0.90 

-0.13 

1.15 

-0.07 

7 

1.09 

0.04 

0.97 

0.06 

1.01 

0.11 

0.96 

-0.06 

8 

0.98 

-0.05 

0.97 

-0.05 

1.01 

0.00 

1.06 

0.13 

9 

0.96 

-0.01 

1.20 

-0.11 

0.92 

-0.15 

0.99 

-0.10 

10 

1.10 

-0.01 

0.71 

0.14 

1.09 

0.12 

1.04 

0.12 

11 

1.02 

-0.03 

0.77 

0.13 

0.90 

-0.02 

1.06 

-0.15 

12 

0.94 

0.09 

1.23 

-0.19 

0.98 

0.05 

1.09 

0.15 

13 

1.06 

-0.01 

0.79 

-0.03 

1.01 

0.03 

0.94 

-0.02 

14 

0.98 

-0.06 

1.47 

0.09 

0.89 

-0.10 

1.07 

0.12 

15 

1.08 

0.07 

0.87 

-0.06 

0.99 

0.04 

0.96 

-0.04 

16 

1.02 

0.05 

0.94 

-0.02 

1.04 

0.04 

1.01 

0.06 

17 

0.89 

-0.04 

0.86 

0.15 

0.83 

-0.02 

1.07 

-0.01 

18 

1.08 

-0.04 

1.01 

-0.11 

1.14 

-0.01 

0.98 

0.05 

19 

1.15 

0.09 

1.23 

-0.10 

0.89 

-0.05 

1.13 

-0.04 

20 

0.97 

0.04 

0.78 

0.14 

1.04 

-0.01 

1.00 

-0.01 

102 


Q 

w 

5 

<z 

05 


J5 

-Q 
(0 


(0 

a 

Z> 

2 
O 

c 
i> 
co 
> 

a 

LU 


E 
o 

■D 
05 

a: 


3 

co 


1 


o 

£ 

CO 

E 
E 


<N  CN  ■*  CM 

o  q  p  q 

LU 
< 

cncTccfin' 

CM  CO  CM  CN 

dddd 

3 

o  q  q  q 

■ 

5 

CM 

afco'^f-^ 
t-  <o  m  ■* 

o  d  d  d 

cm"  CM  cm" CM 
q  q  q  q 

<£ 

m  in  <r>  r-- 

111 
< 


a 


LU 
< 


.2 

1 

o 
O 


o  o  o  o 


^r  ^-  in  co 
q  q  q  q 

??^f  in  c*f 
o  o  o  o 
dddd 


loscoin 
q  q  q  q 

cn  pf^^ 
o  o  o  o 
dddd 


o  o  o  o 


o  t-  o  r 
o  o  o  a 

<y>  o>  o)  en 
dddd 


CM  r-  r-  o 
q  q  q  q 

tn^r^^? 

h-  CO  O)  O) 

dddd 


o  o  o  o 


q  q  q  q        5 


CD 

CO   COM   ^ 

o>  oo  eo  r— 

dddd 

£• 

o 

w 

(A 

C 

c 

o 

S 
a. 

E 
o 

o  o  o  o 

s 

x-  CM  CO  T 

c 
o 

Q  Q  Q  Q 

O 

O 

2222 

£• 

o 


I 


cm"  CO?' CO 

q  q  q  q 

oaf  of  «f 

CO  CN  CM  CM 

dddd 

cn"5"co"? 
q  q  q  q 

o  ^csfco 

CO  MO  IO 

dddd 

cm^cmco" 
q  q  q  q 

oo'o^cB'oo' 
eo  in  in  r- 

d  d  d  d 

?"P?"in 
q  q  q  q 

co  co  in  ^ 
o  o  o  o 

dddd 

co  cn"io  s 
q  q  q  q 

o  o  o  o 

dddd 
i 

in"  o"  en"  in" 

P  T  P  P 

o  o  o  o 

dddd 

q  q  q  q 

coafr^oo 
o  en  o>  o 

dddd 

q  q  q  q 

J^oo  ¥o 
r--  r»  oo  en 

dddd 

^"^P'cm 
q  q  q  q 

O)  Ol  CO  N 

dddd 

o  o  o  o 

t-  CN  CO  •* 

Q  Q  Q  Q 
2222 

JZ 

o 

CO 
CO 


a. 

9 


o 

CN 


CO 

Sc 

"D    O 

P  ro 
co  E 

TJ  -.£= 
C    CO 

co  ai 
If 

m  ^- 


103 
Characteristic  Curve  Transformation 

Item  parameters  for  Form  A  and  Form  B  which  had  been  calibrated 
separately  by  BILOG386  were  also  analyzed  by  an  iterative  process  that 
minimized  the  differences  in  the  item  parameters  of  the  linking  items.  The 
resulting  transformation  was  applied  to  the  Form  B  ability  estimates,  placing 
them  on  the  same  scale  as  the  Form  A  abilities.  Because  this  process  includes 
information  from  the  discrimination  parameters  as  well  as  the  difficulties,  it  was 
anticipated  that  the  transformation  would  be  more  affected  by  the  presence  of 
multidimensionality  than  either  of  the  two  preceding  equating  procedures. 

Table  25  presents  the  results  for  the  characteristic  curve  transformation 
equatings.  Contrary  to  expectations,  the  results  for  this  method  are  almost 
identical  to  those  for  the  equated  bs.  The  patterns  and  comparisons  found  for 
the  equated  bs  method  also  occur  for  the  characteristic  curve  transformation. 
Equating  Results  for  Noneauivalent  Groups 

The  ability  level  of  an  examinee  group  taking  a  second  form  of  a  test  may 
differ  significantly  from  that  of  the  original  testing  group.  This  may  occur,  for 
example,  in  s  state  testing  program  where  only  those  examinees  failing  their  first 
attempt  at  a  proficiency  test  will  take  the  second  form.  The  distribution  of 
abilities  is  thus  lower  for  this  second  group.  One  of  the  theoretical  advantages 
of  IRT  is  its  application  with  nonequivalent  groups. 

For  all  four  multidimensional  conditions,  response  data  for  the 
compensatory  model  were  generated  to  simulate  two  examinee  groups  that 


104 


lO 
CM 

O 

ro 


3 
g 

o 


o 
ID 


ro 

rr 


(D 


03 

E 


0) 

o 


co 

JZ 

O 


£ 

E 
E 


< 

Q 

CO 

<S 

S 

+ 

a: 

CD 

LU 
< 


LU 
< 


O 
O 


CM  CM  CO  CM 
O  O  O  O 

S'of^'co' 

CO  CO  CM  CM 

d  o  o  d 


co  cm  i-  r-. 
r-.  CO  W  tj- 

o  d  d  d 

cJcm  cmco" 
o  o  o  o 

Pen  ~ co 
co  tt  co  r- 
ci  ci  ci  ci 


co  io  in  co 
o  o  o  o 


r-  CM  ■*  CO 

o  o  o  o 


^r  o>  t~-  m 
o  o  o  o 


o  o  o  o 
ci  ci  ci  ci 


£• 
o 


c 

CD 
Q. 

E 
o 
O 


o  o  q  o 
f^f^'  fees' 

o>  <J>  <Ji  O) 
ci  ci  ci  ci 


o  o  o  o 


<r-  t-  CM  CM 

o  o  o  o 


o  o  o  o 


o  o  o  o 

t-  CM  CO  ■* 

Q  Q  Q  Q 
5555 


£• 
o 


E 

8 

c 
o 

z 


■>-  O  00  CO 
CO  CO  CM  CM 


CO  CM  CM  CO 

q  o  o  o 

^cf^^co 
co  t--  cd  m 

d  d  d  d 


oo  r-  oo  co 
co  tt  w  r- 

ci  ci  ci  ci 


CM  CD  in  T 

q  q  q  q 

cfcTcMCM 

o  o  o  o 
d  d  d  d 


tt  o  co  co 

q  q  q  q 

o  o  o  o 

d  d  d  d 


cm  r-  r-  ■* 
q  q  q  q 

cTcfoo 
o  o  o  o 
ci  ci  ci  ci 


f~  i-~  r-  oo 
CD  CJ)  o>  o> 

ci  ci  ci  ci 


o  o  o  o 


o  o  o  o 


no^o 

CJIOIOOS 

ci  ci  ci  ci 


o  o  o  o 

T-  CM  CO  Tf 

Q  Q  Q  Q 
555  5 


o 

CO 

0 


<= 
o 


Q. 


o 

CM 


C 

o 
ro 

Is 

"O    O 

■g  ro 

ro  E 

C    10 

ro  uj 


CO  **■ 
CD    II 

5 


ro 


0)1 

s 

z| 


105 
were  unequal  in  ability.  Equating  was  performed  with  all  three  procedures.  The 
normally  distributed  examinee  group  was  assigned  to  Form  A  and  the  low  ability 
group  was  matched  with  Form  B.  Comparison  conditions  and  evaluation  criteria 
remained  the  same.  To  establish  the  analytical  ability  estimates  for  comparison, 
the  item  parameters  previously  calculated  for  the  four  compensatory  conditions 
were  fixed  in  B1LOG386  and  used  to  calibrate  the  ability  parameters  for  the  low 
group.  Because  the  standard  deviations  across  the  twenty  replications  for  the 
randomly  equivalent  groups  were  so  small,  only  five  repetitions  were  conducted 
for  each  experimental  condition  with  nonequivalent  groups.  Table  26  presents  a 
summary  of  the  results  of  the  three  equatings  with  data  from  nonequivalent 
examinee  groups. 
Concurrent  Calibration 

The  correlations  of  the  equated  ability  estimates  with  the  simulated  6iS 
decrease  as  the  number  of  multidimensional  items  increase.  The  difference  is 
especially  noticeable  between  MD10  and  MD20.  The  SDM  is  negative  and 
substantially  different  from  zero  for  all  conditions.  Because  this  statistics 
subtracts  the  equated  ability  estimate  from  the  simulated  ability,  a  negative 
value  indicates  the  mean  of  the  equated  ability  estimates  is  much  higher  than 
the  comparison  condition.  A  high  positive  SRMSD  value  is  found  in  all 
conditions,  another  indication  of  the  large  difference  in  the  comparison 
conditions  and  equated  ability  estimates  associated  with  each  individual 
examinee.  In  all  cases,  the  correlation  coefficients  were  lower,  the  SDMs  much 


Table  26 


Summary  of  Equating  Results  with  Noneauivalent  Groups 


106 


Correlation 


SDM 


SRMSD 


Condition  6,      ^^    AEa 


01  +  02 


AEa 


0, 


01+02 


AE" 


Concurrent  Calibration 


10           0.84 

0.58 

0.94 

-0.79 

-0.20 

-0.75 

0.85 

0.84 

0.66 

20           0.67 

0.85 

0.95 

-0.84 

-0.28 

-0.72 

0.89 

0.64 

0.64 

30           0.60 

0.88 

0.95 

-0.86 

-0.29 

-0.62 

0.91 

0.63 

0.60 

40           0.57 

0.91 

0.97 

-0.73 

-0.19 

-0.46 

0.96 

0.57 

0.53 

Equated  bs 

10           0.85 

0.63 

0.95 

-0.03 

0.47 

-0.03 

0.57 

0.89 

0.43 

20           0.65 

0.85 

0.95 

-0.10 

0.47 

-0.01 

0.65 

0.80 

0.45 

30           0.58 

0.90 

0.95 

-0.13 

0.34 

-0.07 

0.77 

0.57 

0.43 

40           0.51 

0.92 

0.96 

-0.20 

0.44 

0.01 

0.89 

0.59 

0.42 

Characteristic  Curve  Transformation 

10           0.85 

0.63 

0.95 

-0.03 

0.47 

-0.03 

0.57 

0.90 

0.41 

20           0.65 

0.85 

0.94 

-0.11 

0.45 

-0.01 

0.74 

0.78 

0.39 

30           0.58 

0.90 

0.95 

-0.12 

0.33 

-0.10 

0.84 

0.55 

0.46 

40           0.51 

0.92 

0.96 

-0.18 

0.45 

0.02 

0.97 

0.61 

0.37 

Note.  Means  of  5  replications  for  each  condition, 
a  =  Analytical  Estimation 


107 
lower,  and  the  SRMSDs  much  higher  than  the  corresponding  results  for 
concurrent  calibration  equating  with  the  randomly  equivalent  examinee  groups. 

Correlations  of  equated  abilities  with  the  average  of  simulated  6,  and  82 
increase  with  the  number  of  multidimensional  items.  The  SDM  values  do  not 
show  as  serious  a  departure  from  zero  with  the  average  of  simulated  61  and  82 
as  is  found  with  the  81  comparison.  Furthermore,  the  increase  in  SRMSDs  is 
not  as  marked  with  the  ability  average  comparison  condition.  Examination  of 
the  data  for  nonequivalent  examinee  groups  reveals  lower  correlations,  lower 
SDMs,  and  higher  SRMSDs  across  all  conditions  than  those  found  for 
equivalent  groups.  This  seems  to  suggest  that  of  the  two  conditions  comparing 
the  equated  ability  estimates  with  simulated  thetas,  the  average  of  the  two 
abilities  is  a  better  descriptor  of  the  underlying  multidimensional  relationship 
than  is  81. 

Inspection  of  the  concurrent  calibration  outcomes  in  comparison  with  the 
analytic  estimates  produces  some  striking  results.  As  was  found  with  the 
equivalent  groups,  the  correlation  coefficients  are  very  high  and  consistent 
across  all  multidimensionality  conditions.  Unlike  the  equivalent  group  results, 
the  SDMs  reveal  large  differences  between  the  means  of  the  equated  ability 
sets  and  means  of  sets  of  analytical  estimates.  In  all  cases,  the  equated  ability 
estimates  are  higher.  Similarly,  the  SRMSDs  are  higher  for  the  nonequivalent 
examinee  groups.    This  may  indicate  that  although  the  relationship  between  the 
analytical  estimates  and  the  concurrent  calibration  estimates  is  almost  perfect  in 


108 
an  ordinal  sense,  the  scaling  of  the  two  sets  of  data  is  different.  It  appears  that 
the  concurrent  calibration,  because  it  included  the  1 ,000  examinees  of  normally 
distributed  abilities,  raised  the  ability  estimates  of  the  combined  groups.  This 
would  be  an  advantage  to  an  examinee  of  low  ability. 

The  one  exception  to  this  pattern  of  results  is  found  in  the  MD40 
condition.  For  the  test  containing  all  multidimensional  items,  the  SRMSD  is 
smaller  than  in  the  other  three  conditions.  This  is  also  true  with  the  ability 
average  comparison  condition.  Furthermore,  the  SDMs  are  closer  to  zero  for 
MD40.  This  may  imply  that  when  the  multidimensionality  becomes  more 
pervasive,  the  concurrent  calibration  of  abilities  was  slightly  more  accurate. 
Equated  bs  and  Characteristic  Curve  Transformation 
As  in  the  case  with  equivalent  group  equating,  the  results  for  the  equated  bs 
and  characteristic  curve  transformation  procedures  are  nearly  identical  so 
they  will  be  discussed  together.  Constants  for  the  equated  bs  linking  items  are 
shown  in  Table  27. 

The  correlation  coefficients  of  equated  abilities  with  0i  decrease  and 
those  with  the  average  of  6i  and  92  increase  as  the  amount  of 
multidimensionality  in  a  test  increases.  The  correlations  with  the  analytical 
estimates  are  again  high  and  consistent.  For  the  nonequivalent  examinee 
groups,  the  SDMs  for  the  9i  and  analytical  estimation  comparison  are  closer  to 
zero  than  they  were  for  concurrent  calibration.  An  unusual  occurrence  is  seen 
with  the  standardized  difference  between  means  (SDM)  for  the  theta  average 


109 


Table  27 


Constants  for  Equated  bs  Equating  of  Compensatory  Forms  with  Nonequivalent 


Examinee  Groups 

MD10 

MD20 

MD30                     MD40 

Rep      Slope   Intercept 

Slope   Intercept 

Slope   Intercept    Slope   Intercept 

1 

0.68 

-0.81 

0.65 

-0.71 

0.71 

-0.63 

0.70 

-0.71 

2 

0.64 

-0.70 

0.62 

-0.74 

0.67 

-0.57 

0.70 

-0.62 

3 

0.58 

-0.71 

0.65 

-0.68 

0.76 

-0.65 

0.69 

-0.68 

4 

0.59 

-0.76 

0.59 

-0.68 

0.73 

-0.62 

0.62 

-0.66 

5 

0.69 

-0.89 

0.45 

-0.80 

0.79 

-0.62 

0.63 

-0.59 

comparison  conditions.  The  results  are  substantial  and  positive  values, 
indicating  the  estimated  abilities  were  lower  than  the  average  of  the  two 
simulated  traits.  This  is  opposite  to  the  results  for  the  concurrent  calibration. 

The  SRMSDs  are  also  similar  for  equated  bs  and  characteristic  curve 
transformation  procedures.  When  correlated  using  9j,  the  SRMSD  generally 
increases  with  an  increase  in  the  number  of  multidimensional  items,  but 
decreases  when  calculated  by  using  the  average  of  6i  and  62.  When  calculated 
by  using  the  analytical  estimates,  the  SRMSDs  are  fairly  consistent  across 
conditions  for  equated  bs,  although  more  variability  is  noted  with  the 
characteristic  curve  transformation  data. 

In  general,  when  performed  on  data  from  nonequivalent  examinee 
groups,  the  equating  procedures  studied  produced  less  than  optimal  equating 


110 
results.  The  differences  between  the  equated  ability  estimates  and  the 
simulated  abilities  in  the  comparison  conditions  were  larger  for  nonequivalent 
examinee  groups  than  for  randomly  equivalent  groups.  The  concurrent 
calibration  procedure,  due  to  the  presence  of  the  normally  distributed  group,  led 
to  departures  from  the  SDM  and  SRMSD  comparison  conditions,  although  the 
correlations  indicated  the  ranking  of  examinees  was  still  fairly  similar. 


CHAPTER  5 
CONCLUSIONS 


The  purpose  of  this  study  was  to  investigate  the  effects  of  ignoring  the 
presence  of  multidimensional  items  when  applying  unidimensional  IRT  equating 
procedures.  The  specific  effects  of  interest  were  (a)  data  generated  with  the 
compensatory  and  noncompensatory  models,  (b)  the  IRT  equating  method 
chosen,  (c)  the  number  of  multidimensional  items  present  in  the  test,  and  (d)  the 
equivalence  of  the  ability  levels  of  examinee  groups.  Data  were  simulated  and 
equated  and  the  results  were  compared  against  three  comparison  criteria  using 
three  statistical  indicators. 

Effects  of  Multidimensional  Model 

Data  for  the  four  multidimensional  conditions  were  simulated  using  a 
compensatory  M2PL  model  (Reckase,  1985).  Each  compensatory  item  set  was 
transformed  to  corresponding  noncompensatory  item  parameters  so  the 
differences  in  the  probability  of  a  correct  response  for  the  two  models  by  a  given 
examinee  were  minimized  (Ackerman,  1989).  Very  similar  results  were  achieved 
for  both  models  with  all  three  IRT  equating  procedures  using  data  from  randomly 
equivalent  examinee  groups.  When  the  multidimensional  data  are  matched  so 
the  probability  of  a  correct  response  is  equal,  there  is  little  effect  of  model  on  the 
ensuing  equatings,  even  though  the  IRSs  may  differ  for  individual  items. 


111 


112 
Effects  of  Equating  Method 

Concurrent  calibration,  equated  bs,  and  the  characteristic  curve 
transformation  were  investigated.  Discussion  of  results  for  randomly  equivalent 
examinee  groups  will  be  presented  here.  Results  for  groups  differing  in  mean 
ability  will  be  presented  in  another  section. 

Few  differences  are  seen  in  the  results  obtained  for  each  of  the  three 
methods.  The  standardized  difference  between  means  (SDM)  statistics  are 
almost  identical  for  corresponding  comparison  conditions  at  each  level  of 
multidimensionality  for  all  equating  methods.  The  same  similarities  are  also  found 
with  the  standardized  root  mean  squared  difference  (SRMSD)  statistics. 

All  results  for  equated  bs  and  characteristic  curve  transformation  equatings 
are  nearly  identical.  These  results  are  somewhat  unexpected.  Because  the 
characteristic  curve  transformation  procedure,  unlike  the  equated  bs,  includes 
more  information  by  using  the  item  discriminations,  it  seemed  reasonable  to 
believe  it  would  be  more  sensitive  to  the  presence  of  multidimensional  data.  It  is 
probable  that  the  unidimensional  parameter  estimation  process  with  BILOG386, 
the  first  step  in  both  equating  methodologies,  concealed  the  multidimensionality 
in  the  items.  Each  procedure  was  then  using  the  same  basic  data  for  the 
equating  process. 

Effects  of  the  Number  of  Multidimensional  Items 

Forty-item  tests  were  generated  with  10,  20,  30  or  40  multidimensional 
items.  The  effects  of  the  number  of  multidimensional  items  on  the  equating 


113 
results  varied  according  to  the  comparison  condition  used.  A  review  of  these 
comparison  conditions  is  required  before  proceeding. 

Two  baseline  conditions  employed  the  simulated  ability  parameters.  The 
first,  61,  was  used  because  most  tests  are  designed  to  measure  only  one 
dominant  factor.  In  this  case,  the  first  ability  would  be  the  trait  of  interest  in  the 
test.  The  second  comparison  condition  was  the  average  of  the  simulated  81  and 
82.  This  condition  was  based  on  the  work  of  Yen  (1984)  and  Ansley  and  Forsyth 
(1985)  in  which  the  unidimensional  estimates  of  the  ability  parameters  appeared 
to  be  combinations  of  the  simulated  multidimensional  traits.  The  third  comparison 
condition  was  calculated  using  the  unidimensional  approximation  of  the 
multidimensional  item  parameters  described  by  Wang  (1986).  The  resulting 
analytical  estimations  used  all  available  item  information  to  define  the  item 
parameters.  This  procedure  was  used  because  it  provided  an  alternative  method 
to  define  the  composite  of  multidimensional  traits. 

This  discussion  about  the  effects  due  to  the  number  of  multidimensional 
items  applies  only  to  the  equatings  performed  on  data  from  examinee  groups  with 
equivalent  ability  levels.  The  effects  for  nonequivalent  examinee  groups  will  be 
presented  in  the  next  section.  A  change  in  the  number  of  multidimensional  items 
yield  results  that  vary  greatly,  but  consistently,  depending  on  the  comparison 
condition  viewed.  Comparisons  with  61  for  all  equating  methods  produce 
correlation  coefficients  that  decrease  in  strength  as  the  number  of 
multidimensional  items  in  a  test  increase.  The  SRMSD  increases,  revealing 


114 
greater  differences  between  the  comparison  condition  and  equated  abilities  for 
individual  examinees  as  the  number  of  items  increases.  The  SDM  values  remain 
approximately  0.0  for  all  conditions  and  comparison  conditions  showing  equal 
mean  abilities  for  the  two  sets  of  data. 

When  the  average  of  61  and  G2  is  used  as  the  condition  for  comparison, 
the  differences  between  multidimensional  conditions  are  in  the  opposite  direction. 
Correlations  increase  and  the  SRMSDs  decrease  as  the  amount  of 
multidimensionality  increases.  These  findings  are  consistent  across  all  equating 
methods  and  for  both  multidimensional  IRT  models. 

The  results  seen  for  these  two  comparison  conditions  seem  reasonable. 
As  the  multidimensionality  becomes  more  pervasive,  the  second  ability  trait 
becomes  more  important.  The  average  of  the  two  abilities  then  describes  the 
relationship  more  accurately  than  the  single  dominant  trait.  However,  even 
though  the  correlation  with  0i  decreases  in  the  presence  of  more 
multidimensional  items,  it  remains  moderately  high  (p  =  .78  or  .74)  in  even  the 
most  extreme  condition,  MD40.  This  implies  the  first  factor  retains  a  dominant 
position  in  the  multidimensional  relationship. 

Comparisons  against  the  analytical  estimations  are  perhaps  even  more 
revealing.  The  correlations  yield  coefficients  of  .97  or  .98  for  all  multidimensional 
conditions  and  equating  methods.  The  SRMSD  statistics  are  consistent  and 
comparatively  low  across  all  levels  of  multidimensionality. 

If  the  dominant  first  factor  is  considered  of  main  interest,  the  number  of 
multidimensional  items  in  a  test  does  have  an  effect.  As  the  number  of 


115 
multidimensional  items  increases,  the  equated  traits  appear  discriminate  lesson 
the  factor  of  interest.  However,  if  a  composite  of  the  traits  measured  by  a  test  are 
of  primary  interest,  the  number  of  multidimensional  items  may  not  be  as  critical.  If 
the  composite  is  defined  as  the  average  of  the  9s,  more  multidimensional  items  in 
a  test  lead  to  better  equated  traits.  There  does  not  appear  to  be  any  effect  of 
multidimensional  items  on  unidimensional  equating  when  the  criterion  is  the 
analytical  approximations.  For  data  generated  through  application  of  a 
compensatory  multidimensional  IRT  model,  the  analytical  approximations  define 
the  6  composite  equally  well  in  all  research  conditions. 

Effects  of  Nonequivalent  Examinee  Groups 
No  significant  effects  of  equating  method  or  number  of  multidimensional 
items  were  found  with  the  data  from  randomly  equivalent  groups  of  normally 
distributed  examinees.  Results  from  the  equatings  of  data  from  nonequivalent 
examinee  groups  raise  some  questions.  Comparisons  of  the  correlations  with 
the  9i  baseline  for  differing  ability  groups  reveal  the  same  direction  of  change  as 
was  found  for  the  equivalent  groups,  but  the  strength  of  these  relationships  is 
noticeably  lower.  However,  the  correlations  between  the  equated  abilities  and  the 
analytical  estimations  remain  strong  and  consistent  across  all  multidimensionality 
conditions.  The  SDM  statistics  are  of  interest  for  concurrent  calibration  because 
they  are  so  much  lower  than  was  previously  seen  for  all  equivalent  group 
conditions,  and  they  are  also  lower  than  the  corresponding  results  for  the  other 
two  equating  methods. 


116 
With  the  nonequivalent  groups,  multidimensionality  becomes  more  of  a 
consideration.  The  dimensionality  of  a  given  test  is  not  merely  a  function  of  the 
item  parameters,  but  an  interaction  between  the  items  and  the  examinees  (Stout, 
1990).  This  appears  to  be  occurring  in  the  data  investigated  in  this  study.  Even  if 
the  test  itself  is  considered  essentially  unidimensional,  the  introduction  of  the 
nonequivalent  groups  seems  to  have  negated  any  tendency  for  the  equatings  to 
perform  well.    Concurrent  calibration  seems  most  affected  by  the  interaction  of 
the  multidimensional  items  and  dissimilar  examinee  groups. 

Implications 
The  results  of  this  study  failed  to  reveal  any  effect  on  unidimensional 
equating  results  associated  with  the  choice  of  multidimensional  model.  With 
randomly  equivalent  groups,  there  was  little  difference  attributable  to 
unidimensional  equating  procedure.  The  number  of  multidimensional  items  of 
had  an  effect  if  either  the  dominant  first  factor  or  the  composite  of  traits  defined 
by  the  average  is  of  interest.  If  the  composite  defined  by  the  analytical 
approximations  is  of  interest,  the  number  of  multidimensional  items  had  no  effect 
on  equating  results.  With  two  groups  who  differ  in  ability  levels,  there  was  an 
effect  of  both  equating  procedure  and  number  of  multidimensional  items.    Further 
investigation  is  needed  with  other  combinations  of  ability  levels  to  determine  what 
differences  can  be  tolerated.  In  this  study,  the  effect  of  model  was  not 
investigated.  Although  no  differences  were  found  due  to  model  with  randomly 
equivalent  groups,  further  research  is  needed  to  determine  the  effect  of  model  on 
nonequivalent  groups. 


117 
The  analytical  estimation  comparison  condition  produced  excellent 
consistency  and  performance  throughout  the  variety  of  conditions.  The 
unidimensional  approximations  of  the  multidimensional  item  parameters  seems  to 
provide  an  excellent  description  of  the  underlying  multidimensional  relationship 
between  the  discriminations,  difficulties,  and  latent  traits.  This  procedure  needs 
to  be  studied  further  to  determine  if  it  truly  recovers  the  nature  of  the 
multidimensional  data,  or  it  is  simply  mimicking  the  BILOG  estimations. 

The  large  mean  differences  displayed  in  the  concurrent  calibration  of 
nonequivalent  groups  suggests  another  area  for  further  study.  Because  the 
analytical  estimates  correlated  highly  with  the  equated  abilities,  even  though  the 
SDMs  were  low,  there  is  reason  to  believe  the  ranking  of  the  examinees 
remained  relatively  similar.    Perhaps  the  calibration  procedure  was  affected  by 
the  vast  differences  in  traits,  resulting  in  ability  estimates  on  a  different  scale  but 
in  the  same  rank  order. 

Results  of  this  study  are  not  meant  to  generalize  to  other  IRT  models  or 
equating  methods.  The  degree  and  number  of  multidimensional  items  were 
selected  to  be  reasonable  for  practical  testing  situations,  but  they  do  not  cover  all 
possible  contingencies.  Although  there  were  no  effects  on  unidimensional 
equating  found  due  to  procedure  or  amount  of  multidimensionality  present  in  the 
data,  caution  in  applying  these  unidimensional  IRT  equating  methods  on  data 
gathered  from  examinee  groups  unequal  in  ability  is  advised. 


APPENDIX 
ITEM  PARAMETER  DATA 


Table  28 

Simulated  Compensatory  Item  Parameters  for  MD10  Form  A 

Item    Form a^ a, a^ d, MDISC MID 


1 

A,B 

0 

1.307 

0.000 

0.438 

1.307 

-0.335 

2 

A,B 

0 

1.328 

0.000 

-0.630 

1.328 

0.475 

3 

A,B 

0 

0.929 

0.000 

0.476 

0.929 

-0.512 

4 

A,B 

0 

0.631 

0.000 

-0.076 

0.631 

0.120 

5 

A,B 

0 

0.660 

0.000 

0.627 

0.660 

-0.950 

6 

A,B 

0 

1.928 

0.000 

0.031 

1.928 

-0.016 

7 

A,B 

0 

0.456 

0.000 

0.076 

0.456 

-0.166 

8 

A,B 

0 

1.642 

0.000 

-2.100 

1.642 

1.279 

9 

A,B 

0 

1.865 

0.000 

2.179 

1.865 

-1.168 

10 

A,B 

30 

0.448 

0.259 

0.514 

0.518 

-0.992 

11 

A,B 

45 

1.223 

1.223 

-0.873 

1.730 

0.504 

12 

A,B 

60 

0.598 

1.036 

-0.060 

1.197 

0.050 

13 

A 

0 

1.450 

0.000 

1.746 

1.450 

-1.205 

14 

A 

0 

0.603 

0.000 

0.020 

0.603 

-0.034 

15 

A 

0 

0.685 

0.000 

1.331 

0.685 

-1.943 

16 

A 

0 

0.589 

0.000 

0.596 

0.589 

-1.012 

17 

A 

0 

1.232 

0.000 

0.552 

1.232 

-0.448 

18 

A 

0 

3.491 

0.000 

1.678 

3.491 

-0.481 

19 

A 

0 

0.511 

0.000 

0.053 

0.511 

-0.105 

20 

A 

0 

1.556 

0.000 

-1.732 

1.556 

1.114 

21 

A 

0 

0.441 

0.000 

-0.243 

0.441 

0.550 

22 

A 

0 

0.618 

0.000 

-0.138 

0.618 

0.223 

23 

A 

0 

3.254 

0.000 

-2.267 

3.254 

0.697 

24 

A 

0 

1.569 

0.000 

-2.037 

1.569 

1.298 

25 

A 

0 

0.962 

0.000 

0.742 

0.962 

-0.771 

26 

A 

0 

0.707 

0.000 

-0.450 

0.707 

0.637 

27 

A 

0 

0.490 

0.000 

0.276 

0.490 

-0.563 

28 

A 

0 

0.414 

0.000 

0.387 

0.414 

-0.935 

29 

A 

0 

0.817 

0.000 

-1.327 

0.817 

1.624 

30 

A 

0 

2.963 

0.000 

1.193 

2.963 

-0.403 

31 

A 

0 

1.110 

0.000 

1.827 

1.110 

-1.646 

32 

A 

0 

2.238 

0.000 

-0.674 

2.238 

0.301 

33 

A 

0 

1.903 

0.000 

1.475 

1.903 

-0.775 

34 

A 

30 

0.806 

0.465 

-1.366 

0.930 

1.468 

35 

A 

45 

0.897 

0.897 

0.227 

1.269 

-0.179 

36 

A 

60 

0.285 

0.494 

-0.865 

0.570 

1.517 

37 

A 

20 

1.462 

0.532 

1.910 

1.556 

-1.227 

38 

A 

30 

1.268 

0.732 

-0.499 

1.464 

0.341 

39 

A 

45 

0.286 

0.286 

0.378 

0.405 

-0.933 

40 

A 

60 

0.525 

0.910 

-0.092 

1.050 

0.088 

119 


120 
Table  29 
Simulated  Compensatory  Item  Parameters  for  MP10  Form  B 


Item 

Form 

Oh 

a, 

a2 

di 

MDISC 

MID 

1 

A,B 

0 

1.307 

0.000 

0.438 

1.307 

-0.335 

2 

A,B 

0 

1.328 

0.000 

-0.630 

1.328 

0.475 

3 

A,B 

0 

0.929 

0.000 

0.476 

0.929 

-0.512 

4 

A,B 

0 

0.631 

0.000 

-0.076 

0.631 

0.120 

5 

A.B 

0 

0.660 

0.000 

0.627 

0.660 

-0.950 

6 

A,B 

0 

1.928 

0.000 

0.031 

1.928 

-0.016 

7 

A,B 

0 

0.456 

0.000 

0.076 

0.456 

-0.166 

8 

A,B 

0 

1.642 

0.000 

-2.100 

1.642 

1.279 

9 

A,B 

0 

1.865 

0.000 

2.179 

1.865 

-1.168 

10 

A,B 

30 

0.448 

0.259 

0.514 

0.518 

-0.992 

11 

A,B 

45 

1.223 

1.223 

-0.873 

1.730 

0.504 

12 

A,B 

60 

0.598 

1.036 

-0.060 

1.197 

0.050 

41 

B 

0 

1.422 

0.000 

-2.549 

1.422 

1.793 

42 

B 

0 

0.985 

0.000 

0.125 

0.985 

-0.127 

43 

B 

0 

1.201 

0.000 

1.477 

1.201 

-1.230 

44 

B 

0 

0.571 

0.000 

0.067 

0.571 

-0.117 

45 

B 

0 

0.847 

0.000 

-0.250 

0.847 

0.296 

46 

B 

0 

0.830 

0.000 

-0.321 

0.830 

0.387 

47 

B 

0 

0.386 

0.000 

-0.173 

0.386 

0.448 

48 

B 

0 

0.673 

0.000 

-0.665 

0.673 

0.989 

49 

B 

0 

3.650 

0.000 

6.230 

3.650 

-1.707 

50 

B 

0 

1.030 

0.000 

0.228 

1.030 

-0.222 

51 

B 

0 

0.731 

0.000 

0.408 

0.731 

-0.558 

52 

B 

0 

0.589 

0.000 

-0.105 

0.589 

0.178 

53 

B 

0 

1.720 

0.000 

0.625 

1.720 

-0.364 

54 

B 

0 

0.916 

0.000 

0.950 

0.916 

-1.037 

55 

B 

0 

0.483 

0.000 

0.675 

0.483 

-1.396 

56 

B 

0 

0.876 

0.000 

-1.308 

0.876 

1.494 

57 

B 

0 

2.090 

0.000 

1.529 

2.090 

-0.732 

58 

B 

0 

1.269 

0.000 

1.118 

1.269 

-0.881 

59 

B 

0 

1.300 

0.000 

-2.543 

1.300 

1.956 

60 

B 

0 

3.467 

0.000 

1.880 

3.467 

-0.542 

61 

B 

0 

1.199 

0.000 

1.602 

1.199 

-1.336 

62 

B 

30 

0.401 

0.232 

-0.197 

0.463 

0.426 

63 

B 

45 

1.368 

1.368 

-0.152 

1.935 

0.078 

64 

B 

60 

0.372 

0.645 

-0.334 

0.744 

0.448 

65 

B 

20 

2.407 

0.876 

2.017 

2.562 

-0.787 

66 

B 

30 

0.753 

0.434 

-0.939 

0.869 

1.080 

67 

B 

45 

0.532 

0.532 

0.906 

0.753 

-1.203 

68 

B 

60 

0.711 

1.231 

1.282 

1.422 

-0.902 

Table  30 

Simulated  Compensatory  Item  Parameters  for  MD20  Form  A 


121 


Item    Form 


_«ii_ 


ai 


a. 


MDISC 


MID 


1 

A,B 

0 

0.607 

0.000 

-0.649 

0.607 

1.069 

2 

A,B 

0 

1.547 

0.000 

1.315 

1.547 

-0.850 

3 

A,B 

0 

2.411 

0.000 

2.761 

2.411 

-1.145 

4 

A,B 

0 

1.394 

0.000 

-0.117 

1.394 

0.084 

5 

A,B 

0 

0.791 

0.000 

1.304 

0.791 

-1.649 

6 

A,B 

0 

0.682 

0.000 

1.270 

0.682 

-1.862 

7 

A,B 

45 

1.581 

1.581 

2.479 

2.236 

-1.108 

8 

A,B 

60 

0.666 

1.153 

0.642 

1.331 

-0.482 

9 

A,B 

20 

0.732 

0.266 

-0.104 

0.779 

0.134 

10 

A,B 

30 

0.934 

0.539 

0.650 

1.079 

-0.603 

11 

A,B 

45 

1.223 

1.223 

1.194 

1.730 

-0.690 

12 

A,B 

60 

0.516 

0.894 

0.011 

1.032 

-0.010 

13 

A 

0 

1.019 

0.000 

-0.549 

1.019 

0.539 

14 

A 

0 

1.792 

0.000 

0.064 

1.792 

-0.036 

15 

A 

0 

0.953 

0.000 

0.977 

0.953 

-1.025 

16 

A 

0 

0.797 

0.000 

-0.213 

0.797 

0.267 

17 

A 

0 

0.541 

0.000 

-0.379 

0.541 

0.701 

18 

A 

0 

0.514 

0.000 

0.170 

0.514 

-0.331 

19 

A 

0 

0.789 

0.000 

-0.421 

0.789 

0.534 

20 

A 

0 

0.950 

0.000 

-0.359 

0.950 

0.378 

21 

A 

0 

0.865 

0.000 

-1.582 

0.865 

1.829 

22 

A 

0 

0.453 

0.000 

-0.247 

0.453 

0.545 

23 

A 

0 

0.741 

0.000 

0.014 

0.741 

-0.019 

24 

A 

0 

0.764 

0.000 

0.673 

0.764 

-0.881 

25 

A 

0 

0.756 

0.000 

-0.901 

0.756 

1.191 

26 

A 

0 

0.300 

0.000 

0.181 

0.300 

-0.603 

27 

A 

45 

0.690 

0.690 

-1.701 

0.976 

1.742 

28 

A 

60 

0.313 

0.542 

-0.537 

0.626 

0.858 

29 

A 

20 

0.541 

0.197 

-0.653 

0.576 

1.135 

30 

A 

30 

0.689 

0.398 

0.468 

0.796 

-0.588 

31 

A 

45 

1.378 

1.378 

-0.348 

1.949 

0.179 

32 

A 

60 

0.417 

0.721 

1.166 

0.833 

-1.400 

33 

A 

20 

0.552 

0.201 

-0.126 

0.587 

0.214 

34 

A 

30 

0.605 

0.350 

0.354 

0.699 

-0.506 

35 

A 

45 

0.813 

0.813 

-1.413 

1.149 

1.229 

36 

A 

60 

1.057 

1.830 

-2.439 

2.113 

1.154 

37 

A 

20 

0.896 

0.326 

1.084 

0.953 

-1.137 

38 

A 

30 

1.344 

0.776 

1.117 

1.551 

-0.720 

39 

A 

45 

0.998 

0.998 

1.752 

1.411 

-1.241 

40 

A 

60 

1.079 

1.869 

0.965 

2.158 

-0.447 

122 

Table  31 

Simulated  Compensatory  Item  Parameters  for  MD20  Form  B 

Item    Form oa §j gj dj MDISC MID 


1 

A,B 

0 

0.607 

0.000 

-0.649 

0.607 

1.069 

2 

A,B 

0 

1.547 

0.000 

1.315 

1.547 

-0.850 

3 

A,B 

0 

2.411 

0.000 

2.761 

2.411 

-1.145 

4 

A,B 

0 

1.394 

0.000 

-0.117 

1.394 

0.084 

5 

A,B 

0 

0.791 

0.000 

1.304 

0.791 

-1.649 

6 

A,B 

0 

0.682 

0.000 

1.270 

0.682 

-1.862 

7 

A,B 

45 

1.581 

1.581 

2.479 

2.236 

-1.108 

8 

A,B 

60 

0.666 

1.153 

0.642 

1.331 

-0.482 

9 

A,B 

20 

0.732 

0.266 

-0.104 

0.779 

0.134 

10 

A,B 

30 

0.934 

0.539 

0.650 

1.079 

-0.603 

11 

A,B 

45 

1.223 

1.223 

1.194 

1.730 

-0.690 

12 

A,B 

60 

0.516 

0.894 

0.011 

1.032 

-0.010 

41 

B 

0 

0.990 

0.000 

-1.095 

0.990 

1.106 

42 

B 

0 

1.125 

0.000 

-0.699 

1.125 

0.621 

43 

B 

0 

1.232 

0.000 

-0.053 

1.232 

0.043 

44 

B 

0 

1.240 

0.000 

-0.917 

1.240 

0.740 

45 

B 

0 

1.368 

0.000 

-0.964 

1.368 

0.705 

46 

B 

0 

1.214 

0.000 

1.488 

1.214 

-1.226 

47 

B 

0 

1.154 

0.000 

-1.693 

1.154 

1.467 

48 

B 

0 

0.878 

0.000 

-0.715 

0.878 

0.814 

49 

B 

0 

0.350 

0.000 

0.097 

0.350 

-0.278 

50 

B 

0 

1.063 

0.000 

0.054 

1.063 

-0.051 

51 

B 

0 

1.299 

0.000 

-1.466 

1.299 

1.128 

52 

B 

0 

1.574 

0.000 

-0.398 

1.574 

0.253 

53 

B 

0 

1.121 

0.000 

0.139 

1.121 

-0.124 

54 

B 

0 

0.315 

0.000 

-0.024 

0.315 

0.076 

55 

B 

45 

0.273 

0.273 

0.056 

0.386 

-0.145 

56 

B 

60 

0.706 

1.223 

-0.003 

1.413 

0.002 

57 

B 

20 

1.587 

0.578 

-1.087 

1.689 

0.644 

58 

B 

30 

0.983 

0.567 

-0.958 

1.135 

0.844 

59 

B 

45 

1.050 

1.050 

1.374 

1.485 

-0.925 

60 

B 

60 

0.798 

1.382 

0.612 

1.596 

-0.383 

61 

B 

20 

1.126 

0.410 

-1.868 

1.198 

1.560 

62 

B 

30 

0.604 

0.349 

-0.427 

0.697 

0.612 

63 

B 

45 

0.520 

0.520 

-1.055 

0.735 

1.435 

64 

B 

60 

0.330 

0.571 

1.239 

0.659 

-1.881 

65 

B 

20 

0.341 

0.124 

-0.054 

0.363 

0.148 

66 

B 

30 

1.231 

0.711 

-0.368 

1.422 

0.259 

67 

B 

45 

0.335 

0.335 

-0.570 

0.474 

1.202 

68 

B 

60 

0.421 

0.730 

-1.329 

0.842 

1.578 

123 


Table  32 

Simulated  Compensatory  Item  Parameters  for  MD30  Form  A 


Item 

Form 

an 

a, 

a, 

d 

MDISC 

MID 

1 

A,B 

0 

0.475 

0.000 

-0.584 

0.475 

1.231 

2 

A,B 

0 

0.563 

0.000 

-0.173 

0.563 

0.308 

3 

A,B 

0 

0.515 

0.000 

0.652 

0.515 

-1.266 

4 

A,B 

60 

0.736 

1.275 

1.199 

1.472 

-0.814 

5 

A,B 

20 

1.159 

0.422 

0.681 

1.234 

-0.552 

6 

A,B 

30 

0.706 

0.407 

-0.054 

0.815 

0.066 

7 

A,B 

45 

0.936 

0.936 

-0.939 

1.323 

0.709 

8 

A,B 

60 

0.291 

0.504 

-0.618 

0.582 

1.062 

9 

A,B 

20 

0.684 

0.249 

-0.599 

0.728 

0.822 

10 

A,B 

30 

0.882 

0.510 

1.652 

1.019 

-1.621 

11 

A,B 

45 

1.129 

1.129 

2.676 

1.597 

-1.675 

12 

A,B 

60 

0.881 

1.526 

-1.018 

1.763 

0.578 

13 

A 

0 

0.973 

0.000 

0.549 

0.973 

-0.565 

14 

A 

0 

1.358 

0.000 

-0.324 

1.358 

0.239 

15 

A 

0 

1.857 

0.000 

1.417 

1.857 

-0.763 

16 

A 

0 

0.860 

0.000 

-0.524 

0.860 

0.609 

17 

A 

0 

1.448 

0.000 

1.538 

1.448 

-1.062 

18 

A 

0 

1.517 

0.000 

-0.448 

1.517 

0.295 

19 

A 

0 

0.663 

0.000 

-0.142 

0.663 

0.214 

20 

A 

60 

0.480 

0.832 

0.723 

0.961 

-0.753 

21 

A 

20 

0.648 

0.236 

-0.550 

0.689 

0.798 

22 

A 

30 

1.944 

1.122 

0.992 

2.244 

-0.442 

23 

A 

45 

1.120 

1.120 

0.654 

1.584 

-0.413 

24 

A 

60 

0.268 

0.464 

-0.122 

0.535 

0.228 

25 

A 

20 

0.790 

0.288 

0.295 

0.841 

-0.351 

26 

A 

30 

0.442 

0.255 

0.159 

0.510 

-0.313 

27 

A 

45 

1.452 

1.452 

0.019 

2.053 

-0.009 

28 

A 

60 

0.328 

0.568 

-0.243 

0.656 

0.370 

29 

A 

20 

0.744 

0.271 

0.055 

0.792 

-0.070 

30 

A 

30 

0.398 

0.230 

0.315 

0.460 

-0.686 

31 

A 

45 

0.355 

0.355 

0.924 

0.502 

-1.840 

32 

A 

60 

0.465 

0.806 

-1.060 

0.930 

1.140 

33 

A 

20 

1.442 

0.525 

-1.014 

1.535 

0.661 

34 

A 

30 

1.031 

0.595 

-0.284 

1.191 

0.238 

35 

A 

45 

0.879 

0.879 

1.320 

1.244 

-1.061 

36 

A 

60 

0.431 

0.747 

-0.965 

0.862 

1.119 

37 

A 

20 

0.589 

0.214 

0.533 

0.627 

-0.850 

38 

A 

30 

1.144 

0.661 

2.296 

1.321 

-1.738 

39 

A 

45 

0.810 

0.810 

1.050 

1.145 

-0.917 

40 

A 

60 

0.147 

0.254 

-0.135 

0.293 

0.461 

124 

Table  33 

Simulated  Compensatory  Item  Parameters  for  MD30  Form  B 

Item    Form  ah a, aj dj MDISC  MID 


1 

A,B 

0 

0.475 

0.000 

-0.584 

0.475 

1.231 

2 

A,B 

0 

0.563 

0.000 

-0.173 

0.563 

0.308 

3 

A,B 

0 

0.515 

0.000 

0.652 

0.515 

-1.266 

4 

A,B 

60 

0.736 

1.275 

1.199 

1.472 

-0.814 

5 

A,B 

20 

1.159 

0.422 

0.681 

1.234 

-0.552 

6 

A,B 

30 

0.706 

0.407 

-0.054 

0.815 

0.066 

7 

A,B 

45 

0.936 

0.936 

-0.939 

1.323 

0.709 

8 

A,B 

60 

0.291 

0.504 

-0.618 

0.582 

1.062 

9 

A,B 

20 

0.684 

0.249 

-0.599 

0.728 

0.822 

10 

A,B 

30 

0.882 

0.510 

1.652 

1.019 

-1.621 

11 

A,B 

45 

1.129 

1.129 

2.676 

1.597 

-1.675 

12 

A,B 

60 

0.881 

1.526 

-1.018 

1.763 

0.578 

41 

B 

0 

1.609 

0.000 

-0.935 

1.609 

0.581 

42 

B 

0 

0.771 

0.000 

0.390 

0.771 

-0.506 

43 

B 

0 

0.875 

0.000 

0.363 

0.875 

-0.415 

44 

B 

0 

0.650 

0.000 

0.693 

0.650 

-1.065 

45 

B 

0 

1.272 

0.000 

-2.252 

1.272 

1.771 

46 

B 

0 

1.971 

0.000 

-3.295 

1.971 

1.672 

47 

B 

0 

1.133 

0.000 

1.381 

1.133 

-1.219 

48 

B 

60 

0.471 

0.816 

0.505 

0.943 

-0.536 

49 

B 

20 

1.781 

0.648 

1.493 

1.896 

-0.788 

50 

B 

30 

0.805 

0.465 

0.633 

0.929 

-0.681 

51 

B 

45 

2.105 

2.105 

4.653 

2.977 

-1.563 

52 

B 

60 

0.576 

0.998 

1.119 

1.152 

-0.971 

53 

B 

20 

2.093 

0.762 

0.413 

2.227 

-0.185 

54 

B 

30 

0.834 

0.481 

1.028 

0.963 

-1.068 

55 

B 

45 

0.399 

0.399 

0.038 

0.564 

-0.067 

56 

B 

60 

0.393 

0.681 

0.019 

0.787 

-0.024 

57 

B 

20 

1.663 

0.605 

-0.731 

1.770 

0.413 

58 

B 

30 

0.866 

0.500 

-0.573 

1.000 

0.573 

59 

B 

45 

1.048 

1.048 

-1.546 

1.483 

1.042 

60 

B 

60 

0.350 

0.607 

-0.425 

0.700 

0.606 

61 

B 

20 

1.206 

0.439 

0.242 

1.283 

-0.189 

62 

B 

30 

1.300 

0.751 

1.384 

1.501 

-0.922 

63 

B 

45 

1.301 

1.301 

-1.453 

1.839 

0.790 

64 

B 

60 

0.150 

0.259 

-0.432 

0.299 

1.445 

65 

B 

20 

1.383 

0.503 

-0.724 

1.471 

0.492 

66 

B 

30 

0.841 

0.485 

0.109 

0.971 

-0.113 

67 

B 

45 

0.467 

0.467 

-0.542 

0.660 

0.822 

68 

B 

60 

0.313 

0.541 

-0.324 

0.625 

0.518 

125 
Table  34 
Simulated  Compensatory  Item  Parameters  for  MD40  Form  A 

Item    Form cy §j gj dj MDISC MID 


1 

A,B 

20 

1.486 

0.541 

2.255 

1.581 

-1.426 

2 

A,B 

30 

1.254 

0.724 

0.280 

1.449 

-0.193 

3 

A,B 

45 

1.082 

1.082 

-1.469 

1.530 

0.960 

4 

A,B 

60 

0.940 

1.629 

2.542 

1.881 

-1.352 

5 

A,B 

20 

0.562 

0.205 

0.140 

0.598 

-0.234 

6 

A,B 

30 

0.968 

0.559 

-0.084 

1.118 

0.075 

7 

A,B 

45 

0.691 

0.691 

0.531 

0.977 

-0.543 

8 

A,B 

60 

0.438 

0.758 

0.294 

0.876 

-0.336 

9 

A,B 

20 

2.453 

0.893 

2.783 

2.611 

-1.066 

10 

A,B 

30 

1.451 

0.838 

-2.904 

1.676 

1.733 

11 

A,B 

45 

0.522 

0.522 

-0.147 

0.739 

0.199 

12 

A,B 

60 

0.513 

0.889 

0.895 

1.027 

-0.872 

13 

A 

20 

1.049 

0.382 

0.608 

1.116 

-0.544 

14 

A 

30 

0.813 

0.469 

1.272 

0.939 

-1.355 

15 

A 

45 

0.741 

0.741 

0.651 

1.048 

-0.621 

16 

A 

60 

0.422 

0.731 

-0.180 

0.845 

0.214 

17 

A 

20 

1.838 

0.669 

1.205 

1.956 

-0.616 

18 

A 

30 

1.482 

0.856 

-0.326 

1.712 

0.190 

19 

A 

45 

1.615 

1.615 

1.629 

2.284 

-0.713 

20 

A 

60 

0.292 

0.506 

0.222 

0.584 

-0.381 

21 

A 

20 

0.862 

0.314 

1.126 

0.917 

-1.228 

22 

A 

30 

1.213 

0.700 

-1.515 

1.401 

1.082 

23 

A 

45 

0.607 

0.607 

-0.066 

0.858 

0.077 

24 

A 

60 

0.574 

0.994 

-1.943 

1.148 

1.692 

25 

A 

20 

1.864 

0.678 

-0.554 

1.983 

0.279 

26 

A 

30 

1.376 

0.795 

-0.369 

1.589 

0.232 

27 

A 

45 

0.755 

0.755 

-0.102 

1.068 

0.095 

28 

A 

60 

0.290 

0.503 

-0.316 

0.581 

0.544 

29 

A 

20 

0.694 

0.253 

-0.328 

0.739 

0.444 

30 

A 

30 

0.934 

0.539 

0.613 

1.078 

-0.569 

31 

A 

45 

0.630 

0.630 

-0.349 

0.891 

0.392 

32 

A 

60 

0.283 

0.490 

-0.351 

0.566 

0.619 

33 

A 

20 

2.308 

0.840 

-2.334 

2.456 

0.951 

34 

A 

30 

0.623 

0.360 

0.635 

0.719 

-0.882 

35 

A 

45 

0.574 

0.574 

0.780 

0.812 

-0.960 

36 

A 

60 

0.285 

0.494 

-0.816 

0.570 

1.432 

37 

A 

20 

1.236 

0.450 

-0.073 

1.315 

0.055 

38 

A 

30 

1.916 

1.106 

1.143 

2.212 

-0.517 

39 

A 

45 

0.794 

0.794 

0.915 

1.122 

-0.816 

40 

A 

60 

0.763 

1.321 

0.328 

1.525 

-0.215 

Table  35 

Simulated  Compensatory  Item  Parameters  for  MD40  Form  B 


126 


Item    Form 


-2u_ 


a. 


MDISC 


MID 


1 

A,B 

20 

1.486 

0.541 

2.255 

1.581 

-1.426 

2 

A,B 

30 

1.254 

0.724 

0.280 

1.449 

-0.193 

3 

A,B 

45 

1.082 

1.082 

-1 .469 

1.530 

0.960 

4 

A,B 

60 

0.940 

1.629 

2.542 

1.881 

-1.352 

5 

A,B 

20 

0.562 

0.205 

0.140 

0.598 

-0.234 

6 

A,B 

30 

0.968 

0.559 

-0.084 

1.118 

0.075 

7 

A,B 

45 

0.691 

0.691 

0.531 

0.977 

-0.543 

8 

A,B 

60 

0.438 

0.758 

0.294 

0.876 

-0.336 

9 

A,B 

20 

2.453 

0.893 

2.783 

2.611 

-1.066 

10 

A,B 

30 

1.451 

0.838 

-2.904 

1.676 

1.733 

11 

A,B 

45 

0.522 

0.522 

-0.147 

0.739 

0.199 

12 

A,B 

60 

0.513 

0.889 

0.895 

1.027 

-0.872 

41 

B 

20 

1.013 

0.369 

-1.083 

1.078 

1.004 

42 

B 

30 

1.696 

0.979 

1.507 

1.959 

-0.769 

43 

B 

45 

1.305 

1.305 

0.598 

1.845 

-0.324 

44 

B 

60 

0.617 

1.069 

1.348 

1.235 

-1.092 

45 

B 

20 

0.500 

0.182 

-0.039 

0.532 

0.073 

46 

B 

30 

0.381 

0.220 

0.033 

0.440 

-0.075 

47 

B 

45 

0.750 

0.750 

-0.189 

1.060 

0.178 

48 

B 

60 

1.312 

2.272 

2.665 

2.623 

-1.016 

49 

B 

20 

1.243 

0.452 

-1.267 

1.322 

0.958 

50 

B 

30 

0.688 

0.397 

0.466 

0.794 

-0.587 

51 

B 

45 

0.642 

0.642 

-1.220 

0.908 

1.344 

52 

B 

60 

0.647 

1.120 

-0.741 

1.293 

0.573 

53 

B 

20 

1.328 

0.483 

-0.078 

1.414 

0.055 

54 

B 

30 

0.822 

0.475 

-0.157 

0.949 

0.165 

55 

B 

45 

0.402 

0.402 

0.242 

0.568 

-0.427 

56 

B 

60 

0.395 

0.684 

-0.183 

0.789 

0.232 

57 

B 

20 

0.791 

0.288 

-0.320 

0.841 

0.380 

58 

B 

30 

0.361 

0.209 

0.034 

0.417 

-0.082 

59 

B 

45 

0.841 

0.841 

0.307 

1.189 

-0.258 

60 

B 

60 

0.993 

1.721 

-0.789 

1.987 

0.397 

61 

B 

20 

0.882 

0.321 

1.679 

0.939 

-1.788 

62 

B 

30 

0.756 

0.436 

0.614 

0.873 

-0.703 

63 

B 

45 

0.665 

0.665 

0.514 

0.940 

-0.546 

64 

B 

60 

0.449 

0.777 

-0.267 

0.897 

0.297 

65 

B 

20 

1.562 

0.568 

0.071 

1.662 

-0.042 

66 

B 

30 

1.129 

0.652 

0.014 

1.303 

-0.011 

67 

B 

45 

0.432 

0.432 

0.322 

0.611 

-0.527 

68 

B 

60 

0.266 

0.461 

-1.032 

0.532 

1.940 

127 

Table  36 

Noncompensatory  Item  Parameters  for  Multidimensional  Items  in  MD10  Forms 
A  and  B 


Item 

Form 

an 

Bi 

a2 

bi 

b2 

10 

A,B 

30 

0.352 

0.303 

-0.247 

-1.859 

11 

A,B 

45 

0.923 

0.916 

-1.199 

-1.192 

12 

A,B 

60 

0.605 

0.774 

-1.609 

-0.538 

34 

A 

30 

0.625 

0.499 

-1.826 

-3.009 

35 

A 

45 

0.741 

0.736 

-0.738 

-0.731 

36 

A 

60 

0.359 

0.402 

-3.443 

-2.080 

37 

A 

20 

0.993 

0.628 

0.859 

-1.192 

38 

A 

30 

0.926 

0.671 

-0.741 

-1.749 

39 

A 

45 

0.268 

0.272 

-1.294 

-1.329 

40 

A 

60 

0.552 

0.694 

-1.727 

-0.620 

62 

B 

30 

0.327 

0.288 

-1.429 

-3.039 

63 

B 

45 

1.006 

0.971 

-0.808 

-0.801 

64 

B 

60 

0.430 

0.512 

-2.320 

-1.079 

65 

B 

20 

1.535 

0.804 

0.568 

-1.301 

66 

B 

30 

0.591 

0.477 

-1.525 

-2.707 

67 

B 

45 

0.465 

0.476 

-0.280 

-0.328 

68 

B 

60 

0.669 

0.900 

-0.762 

0.327 

128 
Table  37 
Noncompensatory  Item  Parameters  for  Multidimensional  Items  in  MD20  Form  A 

Item      Form         gM aj aj bj D2 


7 

A,B 

45 

1.100 

1.116 

0.127 

0.282 

8 

A,B 

60 

0.644 

0.834 

-1.234 

-0.030 

9 

A,B 

20 

0.526 

0.378 

-0.595 

-2.961 

10 

A,B 

30 

0.709 

0.526 

-0.092 

-1.177 

11 

A,B 

45 

0.132 

0.100 

0.271 

0.408 

12 

A,B 

60 

0.542 

0.664 

-1.764 

-0.538 

27 

A 

45 

0.585 

0.602 

-2.523 

-2.482 

28 

A 

60 

0.373 

0.430 

-2.941 

-1.515 

29 

A 

20 

0.123 

0.125 

-1.570 

-5.268 

30 

A 

30 

0.537 

0.421 

-0.273 

-1.475 

31 

A 

45 

0.970 

0.933 

-0.951 

-0.913 

32 

A 

60 

1.012 

0.249 

-1.899 

2.344 

33 

A 

20 

0.401 

0.306 

-0.806 

-3.516 

34 

A 

30 

0.476 

0.382 

-0.424 

-1.697 

35 

A 

45 

0.100 

0.107 

0.546 

0.583 

36 

A 

60 

0.784 

1.098 

-2.475 

-1.459 

37 

A 

20 

0.390 

0.257 

1.677 

-5.746 

38 

A 

30 

0.966 

0.667 

0.146 

-0.840 

39 

A 

45 

0.799 

0.785 

0.065 

0.222 

40 

A 

60 

0.846 

1.247 

-1.062 

0.072 

129 
Table  38 
Noncompensatory  Item  Parameters  for  Multidimensional  Items  in  MD20  Form  B 

Item      Form         an aj ag bj bj 


7 

A,B 

45 

1.100 

1.116 

0.127 

0.282 

8 

A,B 

60 

0.644 

0.834 

-1.234 

-0.030 

9 

A,B 

20 

0.526 

0.378 

-0.595 

-2.961 

10 

A,B 

30 

0.709 

0.526 

-0.092 

-1.177 

11 

A,B 

45 

0.132 

0.100 

0.271 

0.408 

12 

A,B 

60 

0.542 

0.664 

-1.764 

-0.538 

55 

B 

45 

0.263 

0.260 

-2.007 

-1.924 

56 

B 

60 

0.672 

0.865 

-1.578 

-0.434 

57 

B 

20 

1.044 

0.622 

-0.830 

-2.740 

58 

B 

30 

0.728 

0.571 

-1.263 

-2.366 

59 

B 

45 

0.803 

0.828 

-0.097 

-0.090 

60 

B 

60 

0.722 

0.972 

-1.220 

-0.057 

61 

B 

20 

0.767 

0.548 

-1.711 

-3.651 

62 

B 

30 

0.478 

0.400 

-1.288 

-2.559 

63 

B 

45 

0.475 

0.478 

-2.396 

-2.358 

64 

B 

60 

0.354 

0.430 

-0.922 

0.689 

65 

B 

20 

0.250 

0.202 

-1.119 

-4.901 

66 

B 

30 

0.884 

0.644 

-0.697 

-1.755 

67 

B 

45 

0.326 

0.326 

-2.632 

-2.577 

68 

B 

60 

0.463 

0.564 

-3.223 

-1.960 

130 
Table  39 
Noncompensatory  Item  Parameters  for  Multidimensional  Items  in  MD30  Form  A 

Item       Form  an gj gj bj b, 


4 

A,B 

60 

0.664 

0.888 

-0.945 

0.309 

5 

A,B 

20 

0.778 

0.528 

0.236 

-2.081 

6 

A,B 

30 

0.528 

0.447 

-0.713 

-2.092 

7 

A,B 

45 

0.705 

0.698 

-1.534 

-1.596 

8 

A,B 

60 

0.352 

0.395 

-3.164 

-1.776 

9 

A,B 

20 

0.478 

0.390 

-1.175 

-3.624 

10 

A,B 

30 

0.638 

0.494 

0.834 

-0.661 

11 

A,B 

45 

0.849 

0.844 

0.555 

0.565 

12 

A,B 

60 

0.728 

0.942 

-1.999 

-0.964 

20 

A 

60 

0.496 

0.606 

-1.268 

0.047 

21 

A 

20 

0.184 

0.149 

-1.188 

-4.872 

22 

A 

30 

2.256 

0.235 

-0.426 

0.705 

23 

A 

45 

0.830 

0.792 

-0.481 

-0.491 

24 

A 

60 

0.692 

0.248 

-4.282 

0.835 

25 

A 

20 

0.545 

0.413 

-0.089 

-2.602 

26 

A 

30 

0.344 

0.306 

-0.745 

-2.472 

27 

A 

45 

0.957 

0.910 

-0.750 

-0.786 

28 

A 

60 

0.381 

0.436 

-2.495 

-1.136 

29 

A 

20 

0.516 

0.400 

-0.361 

-2.876 

30 

A 

30 

0.310 

0.276 

-0.561 

-2.430 

31 

A 

45 

0.312 

0.315 

-0.297 

-0.388 

32 

A 

60 

0.499 

0.585 

-2.774 

-1.633 

33 

A 

20 

0.918 

0.610 

-0.856 

-2.870 

34 

A 

30 

0.725 

0.578 

-0.701 

-1.944 

35 

A 

45 

0.698 

0.677 

-0.060 

-0.058 

36 

A 

60 

0.474 

0.551 

-2.809 

-1.638 

37 

A 

20 

0.412 

0.322 

0.187 

-2.772 

38 

A 

30 

0.814 

0.584 

1.073 

-0.399 

39 

A 

45 

0.653 

0.636 

-0.219 

-0.231 

40 

A 

60 

0.096 

0.207 

-4.957 

-3.588 

131 
Table  40 
Noncompensatory  Item  Parameters  for  Multidimensional  Items  in  MD30  Form  B 

Item      Form ga §j a^ bj b? 


4 

A,B 

60 

0.664 

0.888 

-0.945 

0.309 

5 

A,B 

20 

0.778 

0.528 

0.236 

-2.081 

6 

A,B 

30 

0.528 

0.447 

-0.713 

-2.092 

7 

A,B 

45 

0.705 

0.698 

-1.534 

-1.596 

8 

A,B 

60 

0.352 

0.395 

-3.164 

-1.776 

9 

A,B 

20 

0.478 

0.390 

-1.175 

-3.624 

10 

A,B 

30 

0.638 

0.494 

0.834 

-0.661 

11 

A.B 

45 

0.849 

0.844 

0.555 

0.565 

12 

A,B 

60 

0.728 

0.942 

-1.999 

-0.964 

48 

B 

60 

0.492 

0.595 

-1.443 

-0.150 

49 

B 

20 

1.143 

0.674 

0.583 

-1.718 

50 

B 

30 

0.592 

0.481 

-0.009 

-1.413 

51 

B 

45 

1.329 

1.388 

0.690 

0.645 

52 

B 

60 

0.562 

0.717 

-0.967 

0.331 

53 

B 

20 

1.294 

0.705 

0.019 

-2.054 

54 

B 

30 

0.609 

0.485 

0.338 

-1.106 

55 

B 

45 

0.365 

0.366 

-1.455 

-1.516 

56 

B 

60 

0.435 

0.508 

-1.994 

-0.692 

57 

B 

20 

1.043 

0.647 

-0.594 

-2.610 

58 

B 

30 

0.625 

0.523 

-1.062 

-2.318 

59 

B 

45 

0.751 

0.750 

-1.823 

-1.884 

60 

B 

60 

0.403 

0.462 

-2.594 

-1.291 

61 

B 

20 

0.803 

0.545 

-0.087 

-2.307 

62 

B 

30 

0.899 

0.647 

0.429 

-0.929 

63 

B 

45 

0.856 

0.844 

-1.536 

-1.601 

64 

B 

60 

0.333 

0.100 

-2.658 

3.029 

65 

B 

20 

0.934 

0.577 

-0.713 

-2.671 

66 

B 

30 

0.614 

0.503 

-0.471 

-1.803 

67 

B 

45 

0.423 

0.425 

-1.983 

-2.039 

68 

B 

60 

0.368 

0.418 

-2.663 

-1.290 

132 
Table  41 

Noncompensatory  Item  Parameters  for  Multidimensional  Hems  in  MD40  Form  A 
Item       Form ay gj gj t^ 


1 

A,B 

20 

1.023 

0.595 

1.034 

-1.132 

2 

A,B 

30 

0.932 

0.687 

-0.250 

-1.413 

3 

A,B 

45 

0.796 

0.844 

-1.730 

-1.621 

4 

A,B 

60 

0.743 

1.141 

-0.440 

0.815 

5 

A,B 

20 

0.403 

0.324 

-0.410 

-3.057 

6 

A,B 

30 

0.712 

0.583 

-0.621 

-1.668 

7 

A,B 

45 

0.582 

0.598 

-0.650 

-0.591 

8 

A,B 

60 

0.467 

0.583 

-1.664 

-0.358 

9 

A,B 

20 

1.576 

0.752 

0.841 

-1.235 

10 

A,B 

30 

0.935 

0.825 

-2.037 

-2.758 

11 

A,B 

45 

0.468 

0.482 

-1.432 

-1.358 

12 

A,B 

60 

0.519 

0.668 

-1.123 

0.164 

13 

A 

20 

0.726 

0.499 

0.142 

-1.977 

14 

A 

30 

0.605 

0.491 

0.490 

-0.776 

15 

A 

45 

0.613 

0.633 

-0.543 

-0.488 

16 

A 

60 

0.458 

0.569 

-2.129 

-0.821 

17 

A 

20 

1.205 

0.662 

0.363 

-1.623 

18 

A 

30 

0.994 

0.754 

-0.601 

-1.553 

19 

A 

45 

1.030 

1.095 

-0.145 

-0.112 

20 

A 

60 

0.336 

0.403 

-2.146 

-0.601 

21 

A 

20 

0.602 

0.420 

0.645 

-1.720 

22 

A 

30 

0.845 

0.708 

-1.442 

-2.314 

23 

A 

45 

0.530 

0.544 

-1.227 

-1.153 

24 

A 

60 

0.533 

0.750 

-3.361 

-1.948 

25 

A 

20 

1.185 

0.721 

-0.475 

-2.198 

26 

A 

30 

0.939 

0.728 

-0.655 

-1.610 

27 

A 

45 

0.630 

0.644 

-1.116 

-1.037 

28 

A 

60 

0.341 

0.409 

-2.834 

-1.313 

29 

A 

20 

0.496 

0.398 

-0.870 

-3.100 

30 

A 

30 

0.688 

0.559 

-0.087 

-1.219 

31 

A 

45 

0.547 

0.565 

-1.453 

-1.372 

32 

A 

60 

0.334 

0.400 

-2.929 

-1.391 

33 

A 

20 

1.374 

0.867 

-1.095 

-2.547 

34 

A 

30 

0.477 

0.406 

-0.071 

-1.394 

35 

A 

45 

0.495 

0.508 

-0.461 

-0.417 

36 

A 

60 

0.339 

0.410 

-3.552 

-1.999 

37 

A 

20 

2.651 

0.528 

-2.881 

3.975 

38 

A 

30 

1.001 

0.317 

1.926 

0.677 

39 

A 

45 

0.644 

0.669 

-0.344 

-0.300 

40 

A 

60 

0.683 

0.932 

-1.394 

-0.232 

133 
Table  42 

Noncompensatory  Item  Parameters  for  Multidimensional  Items  in  MD40  Form  B 
Item       Form 04 gj a^ bj b? 


1 

A,B 

20 

1.023 

0.595 

1.034 

-1.132 

2 

A.B 

30 

0.932 

0.687 

-0.250 

-1.413 

3 

A,B 

45 

0.796 

0.844 

-1.730 

-1.621 

4 

A,B 

60 

0.743 

1.141 

-0.440 

0.815 

5 

A,B 

20 

0.403 

0.324 

-0.410 

-3.057 

6 

A,B 

30 

0.712 

0.583 

-0.621 

-1.668 

7 

A,B 

45 

0.582 

0.598 

-0.650 

-0.591 

8 

A,B 

60 

0.467 

0.583 

-1.664 

-0.358 

9 

A,B 

20 

1.576 

0.752 

0.841 

-1.235 

10 

A,B 

30 

0.935 

0.825 

-2.037 

-2.758 

11 

A,B 

45 

0.468 

0.482 

-1.432 

-1.358 

12 

A,B 

60 

0.519 

0.668 

-1.123 

0.164 

41 

B 

20 

1.089 

0.255 

0.000 

0.000 

42 

B 

30 

0.100 

0.254 

1.863 

0.380 

43 

B 

45 

0.904 

0.959 

-0.548 

-0.481 

44 

B 

60 

0.585 

0.785 

-0.830 

0.437 

45 

B 

20 

0.361 

0.299 

-0.731 

-3.511 

46 

B 

30 

0.304 

0.278 

-1.119 

-2.776 

47 

B 

45 

0.626 

0.644 

-1.185 

-1.106 

48 

B 

60 

0.834 

1.439 

-0.883 

0.736 

49 

B 

20 

0.836 

0.618 

-1.162 

-2.849 

50 

B 

30 

0.525 

0.444 

-0.240 

-1.485 

51 

B 

45 

0.619 

0.523 

-0.763 

-1.850 

52 

B 

60 

0.606 

0.821 

-2.144 

-0.966 

53 

B 

20 

0.894 

0.595 

-0.326 

-2.212 

54 

B 

30 

0.247 

1.823 

0.689 

-2.552 

55 

B 

45 

0.369 

0.378 

-1.217 

-1.163 

56 

B 

60 

0.435 

0.537 

-2.208 

-0.872 

57 

B 

20 

0.559 

0.439 

-0.762 

-2.881 

58 

B 

30 

0.289 

0.266 

-1.169 

-2.893 

59 

B 

45 

0.678 

0.700 

-0.771 

-0.700 

60 

B 

60 

0.769 

1.141 

-1.824 

-0.723 

61 

B 

20 

0.614 

0.409 

1.123 

-1.304 

62 

B 

30 

0.570 

0.476 

-0.086 

-1.306 

63 

B 

45 

0.565 

0.580 

-0.673 

-0.615 

64 

B 

60 

0.479 

0.602 

-2.139 

-0.862 

65 

B 

20 

1.031 

0.641 

-0.202 

-2.065 

66 

B 

30 

0.807 

0.642 

-0.494 

-1.513 

67 

B 

45 

0.392 

0.402 

-1.061 

-1.009 

68 

B 

60 

0.321 

0.388 

-4.059 

-2.426 

134 
Table  43 
Analytical  Estimates  of  Unidimensional  Item  Parameters  for  MD10  Form  A 


Item    Form an 


Compensatory Noncompensatory 


1  A,B  0 

2  A,B  0 

3  A,B  0 

4  A,B  0 

5  A,B  0 

6  A,B  0 

7  A,B  0 

8  A,B  0 

9  A,B  0 

10  A,B  30 

11  A,B  45 

12  A,B  60 

13  A  0 

14  A  0 

15  A  0 

16  A  0 

17  A  0 

18  A  0 

19  A  0 

20  A  0 

21  A  0 

22  A  0 

23  A  0 

24  A  0 

25  A  0 

26  A  0 

27  A  0 

28  A  0 

29  A  0 

30  A  0 

31  A  0 

32  A  0 

33  A  0 

34  A  30 

35  A  45 

36  A  60 

37  A  20 

38  A  30 

39  A  45 

40  A  60 


0.763 

-0.336 

0.775 

0.476 

0.543 

-0.514 

0.369 

0.120 

0.386 

-0.954 

1.123 

-0.016 

0.267 

-0.167 

0.957 

1.284 

1.086 

-1.173 

0.274 

-1.093 

0.656 

0.655 

0.352 

0.086 

0.846 

-1.209 

0.352 

-0.033 

0.400 

-1.951 

0.344 

-1.016 

0.719 

-0.449 

2.008 

-0.482 

0.299 

-0.104 

0.908 

0.117 

0.258 

0.553 

0.361 

0.224 

1.876 

0.699 

0.915 

1.303 

0.562 

-0.774 

0.413 

0.639 

0.286 

-0.565 

0.242 

-0.938 

0.478 

1.631 

1.713 

-0.404 

0.648 

-1.653 

1.301 

0.302 

1.108 

-0.778 

0.484 

1.615 

0.518 

-0.232 

0  186 

2.625 

0.862 

-1.269 

0.736 

0.375 

0.180 

-1.214 

0.318 

0.151 

0.763 

-0.336 

0.775 

0.476 

0.543 

-0.514 

0.369 

0.120 

0.386 

-0.954 

1.123 

-0.016 

0.267 

-0.167 

0.957 

1.284 

1.086 

-1.173 

0.273 

-1.399 

0.762 

-1.694 

0.563 

-1.441 

0.846 

-1.209 

0.352 

-0.033 

0.400 

-1.951 

0.344 

-1.016 

0.719 

-0.449 

2.008 

-0.482 

0.299 

-0.104 

0.908 

0.117 

0.258 

0.553 

0.361 

0.224 

1.876 

0.699 

0.915 

1.303 

0.562 

-0.774 

0.413 

0.639 

0.286 

-0.565 

0.242 

-0.938 

0.478 

1.631 

1.713 

-0.404 

0.648 

-1.653 

1.301 

0.302 

1.108 

-0.778 

0.469 

-3.307 

0.612 

-1.041 

0.314 

-3.876 

0.679 

0.089 

0.668 

-1.632 

0.223 

-1.860 

0.510 

-1.587 

135 
Table  44 

Analytical  Estimates  of  Unidimensional  Item  Parameters  for  MD10  Form  B 


Form 

an 

Compensatory 

Noncompensatory 

Item 

a 

b 

a 

b 

1 

A,B 

0 

0.763 

-0.336 

0.763 

-0.336 

2 

A,B 

0 

0.775 

0.476 

0.775 

0.476 

3 

A,B 

0 

0.543 

-0.514 

0.543 

-0.514 

4 

A,B 

0 

0.369 

0.120 

0.369 

0.120 

5 

A,B 

0 

0.386 

-0.954 

0.386 

-0.954 

6 

A,B 

0 

1.123 

-0.016 

1.123 

-0.016 

7 

A,B 

0 

0.267 

-0.167 

0.267 

-0.167 

8 

A,B 

0 

0.957 

1.284 

0.957 

1.284 

9 

A,B 

0 

1.086 

-1.173 

1.086 

-1.173 

10 

A,B 

30 

0.274 

-1.093 

0.273 

-1.399 

11 

A,B 

45 

0.656 

0.655 

0.762 

-1.694 

12 

A,B 

60 

0.352 

0.086 

0.563 

-1.441 

41 

B 

0 

0.830 

1.800 

0.830 

1.800 

42 

B 

0 

0.576 

-0.127 

0.576 

-0.127 

43 

B 

0 

0.701 

-1.235 

0.701 

-1.235 

44 

B 

0 

0.334 

-0.117 

0.334 

-0.117 

45 

B 

0 

0.495 

0.296 

0.495 

0.296 

46 

B 

0 

0.485 

0.388 

0.485 

0.388 

47 

B 

0 

0.226 

0.450 

0.226 

0.450 

48 

B 

0 

0.393 

0.992 

0.393 

0.992 

49 

B 

0 

2.096 

-1.714 

2.096 

-1.714 

50 

B 

0 

0.602 

-0.222 

0.602 

-0.222 

51 

B 

0 

0.427 

-0.560 

0.427 

-0.560 

52 

B 

0 

0.344 

0.179 

0.344 

0.179 

53 

B 

0 

1.003 

-0.364 

1.003 

-0.364 

54 

B 

0 

0.535 

-1.041 

0.535 

-1.041 

55 

B 

0 

0.282 

-1.403 

0.282 

-1.403 

56 

B 

0 

0.512 

1.499 

0.512 

1.499 

57 

B 

0 

1.216 

-0.734 

1.216 

-0.734 

58 

B 

0 

0.741 

-0.884 

0.741 

-0.884 

59 

B 

0 

0.759 

1.964 

0.759 

1.964 

60 

B 

0 

1.995 

-0.544 

1.995 

-0.544 

61 

B 

0 

0.700 

-1.341 

0.700 

-1.341 

62 

B 

30 

0.245 

0.468 

0.256 

-3.081 

63 

B 

45 

0.708 

0.102 

0.820 

-1.139 

64 

B 

60 

0.238 

0.776 

0.387 

-2.347 

65 

B 

20 

1.361 

-0.814 

0.965 

-0.103 

66 

B 

30 

0.454 

1.188 

0.446 

-2.888 

67 

B 

45 

0.327 

-1.564 

0.389 

-0.431 

68 

B 

60 

0.399 

-1.559 

0.637 

-0.196 

136 
Table  45 
Analytical  Estimates  of  Unidimensional  Item  Parameters  for  MD20  Form  A 

Compensatory Noncompensatory 


Item 

Form 

a. 

1 

A,B 

0 

2 

A,B 

0 

3 

A,B 

0 

4 

A,B 

0 

5 

A,B 

0 

6 

A,B 

0 

7 

A,B 

45 

8 

A,B 

60 

9 

A,B 

20 

10 

A,B 

30 

11 

A,B 

45 

12 

A,B 

60 

13 

A 

0 

14 

A 

0 

15 

A 

0 

16 

A 

0 

17 

A 

0 

18 

A 

0 

19 

A 

0 

20 

A 

0 

21 

A 

0 

22 

A 

0 

23 

A 

0 

24 

A 

0 

25 

A 

0 

26 

A 

0 

27 

A 

45 

28 

A 

60 

29 

A 

20 

30 

A 

30 

31 

A 

45 

32 

A 

60 

33 

A 

20 

34 

A 

30 

35 

A 

45 

36 

A 

60 

37 

A 

20 

38 

A 

30 

39 

A 

45 

40 

A 

60 

0.320 

1.178 

0.771 

-0.937 

1.104 

-1.262 

0.703 

0.092 

0.414 

-1.817 

0.359 

-2.053 

1.126 

-1.180 

0.584 

-0.589 

0.456 

0.134 

0.630 

-0.605 

0.901 

-0.735 

0.470 

-0.013 

0.527 

0.594 

0.874 

-0.039 

0.495 

-1.130 

0.417 

0.294 

0.286 

0.772 

0.272 

-0.364 

0.413 

0.588 

0.493 

0.416 

0.451 

2.016 

0.240 

0.601 

0.389 

-0.020 

0.400 

-0.971 

0.396 

1.314 

0.160 

-0.665 

0.529 

1.856 

0.295 

1.048 

0.337 

1.138 

0.466 

-0.590 

1.002 

0.190 

0.386 

-1.710 

0.344 

0.215 

0.409 

-0.508 

0.619 

1.308 

0.828 

1.410 

0.558 

-1.141 

0.906 

-0.722 

0.750 

-1.321 

0.839 

-0.546 

0.320 

1.178 

0.771 

-0.937 

1.104 

-1.262 

0.703 

0.092 

0.414 

-1.817 

0.359 

-2.053 

0.921 

0.290 

0.610 

-0.786 

0.376 

-2.233 

0.513 

-0.781 

0.096 

0.465 

0.499 

-1.543 

0.527 

0.594 

0.874 

-0.039 

0.495 

-1.130 

0.417 

0.294 

0.286 

0.772 

0.272 

-0.364 

0.413 

0.588 

0.493 

0.416 

0.451 

2.016 

0.240 

0.601 

0.389 

-0.020 

0.400 

-0.971 

0.396 

1.314 

0.160 

-0.665 

0.493 

-3.540 

0.333 

-3.084 

0.103 

-4.857 

0.398 

-1.130 

0.791 

-1.318 

0.507 

-1.483 

0.294 

-2.791 

0.357 

-1.398 

0.086 

0.799 

0.772 

-2.670 

0.269 

-1.791 

0.677 

-0.361 

0.658 

0.201 

0.852 

-0.548 

137 
Table  46 
Analytical  Estimates  of  Unidimensional  Item  Parameters  for  MD20  Form  B 


Compensatory Noncompensatory 

Item    Form cm  a  b  a  Ei 


1 

A,B 

0 

0.320 

1.178 

2 

A,B 

0 

0.771 

-0.937 

3 

A,B 

0 

1.104 

-1.262 

4 

A,B 

0 

0.703 

0.092 

5 

A,B 

0 

0.414 

-1.817 

6 

A,B 

0 

0.359 

-2.053 

7 

A,B 

45 

1.126 

-1.180 

8 

A,B 

60 

0.584 

-0.589 

9 

A,B 

20 

0.456 

0.134 

10 

A,B 

30 

0.630 

-0.605 

11 

A,B 

45 

0.901 

-0.735 

12 

A,B 

60 

0.470 

-0.013 

41 

B 

0 

0.512 

1.219 

42 

B 

0 

0.578 

0.685 

43 

B 

0 

0.629 

0.047 

44 

B 

0 

0.632 

0.815 

45 

B 

0 

0.691 

0.776 

46 

B 

0 

0.620 

-1.351 

47 

B 

0 

0.592 

1.617 

48 

B 

0 

0.458 

0.897 

49 

B 

0 

0.186 

-0.305 

50 

B 

0 

0.548 

-0.056 

51 

B 

0 

0.660 

1.244 

52 

B 

0 

0.782 

0.278 

53 

B 

0 

0.576 

-0.136 

54 

B 

0 

0.168 

0.084 

55 

B 

45 

0.213 

-0.154 

56 

B 

60 

0.613 

0.002 

57 

B 

20 

0.986 

0.645 

58 

B 

30 

0.664 

0.847 

59 

B 

45 

0.786 

-0.985 

60 

B 

60 

0.676 

-0.468 

61 

B 

20 

0.701 

1.564 

62 

B 

30 

0.408 

0.614 

63 

B 

45 

0.402 

1.527 

64 

B 

60 

0.310 

-2.295 

65 

B 

20 

0.213 

0.149 

66 

B 

30 

0.831 

0.259 

67 

B 

45 

0.261 

1.281 

68 

B 

60 

0.390 

1.928 

0.320 

1.178 

0.771 

-0.937 

1.104 

-1.262 

0.703 

0.092 

0.414 

-1.817 

0.359 

-2.053 

0.921 

0.290 

0.610 

-0.786 

0.376 

-2.233 

0.513 

-0.781 

0.096 

0.465 

0.499 

-1.543 

0.830 

1.800 

0.576 

-0.127 

0.701 

-1.235 

0.334 

-0.117 

0.495 

0.296 

0.485 

0.388 

0.226 

0.450 

0.393 

0.992 

2.096 

-1.714 

0.602 

-0.222 

0.427 

-0.560 

0.344 

0.179 

1.003 

-0.364 

0.535 

-1.041 

0.217 

-2.780 

0.634 

-1.324 

0.687 

-2.172 

0.540 

-2.466 

0.677 

-0.132 

0.697 

-0.783 

0.546 

-3.552 

0.365 

-2.636 

0.396 

-3.362 

0.325 

-0.054 

0.188 

-3.965 

0.634 

-1.611 

0.271 

-3.683 

0.425 

-3.584 

138 
Table  47 
Analytical  Estimates  of  Unidimensional  Item  Parameters  for  MD30  Form  A 


Compensatory Noncompensatory 

Item    Form c^ a b I b 

0.242  -1.408 

0.286  -0.352 
0.262  1.449 

0.619  -0.330 

0.551  -0.975 

0.406  -1.897 

0.577  -2.229 

0.305  -3.490 

0.362  -3.202 
0.474  0.254 

0.696  0.798 

0.667  -2.051 
0.482  0.646 

0.650  -0.273 

0.842  0.874 

0.429  -0.698 
0.687  1.216 

0.715  -0.338 

0.335  -0.245 

0.446  -0.786 

0.139  -3.990 

0.928  -0.412 

0.669  -0.690 

0.407  -3.941 

0.402  -1.642 

0.270  -2.204 

0.770  -1.090 

0.333  -2.545 

0.384  -2.047 

0.243  -2.039 

0.258  -0.488 

0.440  -3.108 

0.646  -2.307 

0.545  -1.760 

0.567  -0.083 

0.417  -3.137 

0.307  -1.559 

0.588  0.639 

0.531  -0.320 

0.118  -6.017 


1 

A,B 

0 

0.242 

1.408 

2 

A,B 

0 

0.286 

0.352 

3 

A,B 

0 

0.262 

-1.449 

4 

A,B 

60 

0.679 

-0.949 

5 

A,B 

20 

0.712 

-0.559 

6 

A,B 

30 

0.479 

0.066 

7 

A,B 

45 

0.733 

0.737 

8 

A,B 

60 

0.289 

1.237 

9 

A,B 

20 

0.422 

0.833 

10 

A,B 

30 

0.599 

-1.627 

11 

A,B 

45 

0.875 

-1.742 

12 

A,B 

60 

0.790 

0.673 

13 

A 

0 

0.482 

-0.646 

14 

A 

0 

0.650 

0.273 

15 

A 

0 

0.842 

-0.874 

16 

A 

0 

0.429 

0.698 

17 

A 

0 

0.687 

-1.216 

18 

A 

0 

0.715 

0.338 

19 

A 

0 

0.335 

0.245 

20 

A 

60 

0.466 

-0.877 

21 

A 

20 

0.400 

0.808 

22 

A 

30 

1.320 

-0.442 

23 

A 

45 

0.868 

-0.429 

24 

A 

60 

0.267 

0.265 

25 

A 

20 

0.487 

-0.355 

26 

A 

30 

0.300 

-0.312 

27 

A 

45 

1.103 

-0.010 

28 

A 

60 

0.325 

0.432 

29 

A 

20 

0.459 

-0.070 

30 

A 

30 

0.270 

-0.685 

31 

A 

45 

0.283 

-1.913 

32 

A 

60 

0.452 

1.327 

33 

A 

20 

0.882 

0.669 

34 

A 

30 

0.700 

0.239 

35 

A 

45 

0.690 

-1.104 

36 

A 

60 

0.421 

1.304 

37 

A 

20 

0.363 

-0.862 

38 

A 

30 

0.777 

-1.738 

39 

A 

45 

0.637 

-0.953 

40 

A 

60 

0.148 

0.536 

139 
Table  48 
Analytical  Estimates  of  Unidimensional  Item  Parameters  for  MD30  Form  B 


Item    Form gM 


Compensatory Noncompensatory 


1  A,B  0 

2  A,B  0 

3  A,B  0 

4  A,B  60 

5  A,B  20 

6  A,B  30 

7  A,B  45 

8  A,B  60 

9  A,B  20 

10  A,B  30 

11  A,B  45 

12  A,B  60 

41  B  0 

42  B  0 

43  B  0 

44  B  0 

45  B  0 

46  B  0 

47  B  0 

48  B  60 

49  B  20 

50  B  30 

51  B  45 

52  B  60 

53  B  20 

54  B  30 

55  B  45 

56  B  60 

57  B  20 

58  B  30 

59  B  45 

60  B  60 

61  B  20 

62  B  30 

63  B  45 

64  B  60 

65  B  20 

66  B  30 

67  B  45 

68  B  60 


0.242 

1.408 

0.286 

0.352 

0.262 

-1.449 

0.679 

-0.949 

0.712 

-0.559 

0.479 

0.066 

0.733 

0.737 

0.289 

1.237 

0.422 

0.833 

0.599 

-1.627 

0.875 

-1.742 

0.790 

0.673 

0.751 

0.665 

0.387 

-0.579 

0.436 

-0.475 

0.328 

-1.221 

0.614 

2.027 

0.882 

1.914 

0.554 

-1.395 

0.458 

-0.624 

1.084 

-0.798 

0.547 

-0.681 

1.519 

-1.625 

0.549 

-1.131 

1.267 

-0.188 

0.566 

-1.068 

0.318 

-0.070 

0.386 

-0.028 

1.014 

0.418 

0.588 

0.573 

0.816 

1.084 

0.346 

0.707 

0.740 

-0.191 

0.883 

-0.922 

0.998 

0.821 

0.151 

1.680 

0.847 

0.498 

0.571 

-0.112 

0.372 

0.853 

0.310 

0.604 

0.242 

-1.408 

0.286 

-0.352 

0.262 

1.449 

0.619 

-0.330 

0.551 

-0.975 

0.406 

-1.897 

0.577 

-2.229 

0.305 

-3.490 

0.362 

-3.202 

0.474 

0.254 

0.696 

0.798 

0.667 

-2.051 

0.751 

-0.665 

0.387 

0.579 

0.436 

0.475 

0.328 

1.221 

0.614 

-2.027 

0.882 

-1.914 

0.554 

1.395 

0.440 

-1.061 

0.770 

-0.373 

0.448 

-0.898 

1.102 

0.953 

0.515 

-0.346 

0.847 

-0.978 

0.457 

-0.424 

0.301 

-2.118 

0.383 

-1.861 

0.716 

-1.890 

0.478 

-2.303 

0.617 

-2.642 

0.352 

-2.730 

0.569 

-1.370 

0.651 

-0.194 

0.699 

-2.234 

0.190 

-1.793 

0.640 

-2.021 

0.466 

-1.507 

0.349 

-2.868 

0.321 

-2.778 

140 

Table  49 

Analytical  Estimates  of  Unidimensional  Item  Parameters  for  MD40  Form  A 


Item    Form an 


Compensatory Noncompensatory 


1  A,B  20 

2  A,B  30 

3  A,B  45 

4  A.B  60 

5  A,B  20 

6  A,B  30 

7  A,B  45 

8  A.B  60 

9  A,B  20 

10  A,B  30 

11  A,B  45 

12  A,B  60 

13  A  20 

14  A  30 

15  A  45 

16  A  60 

17  A  20 

18  A  30 

19  A  45 

20  A  60 

21  A  20 

22  A  30 

23  A  45 

24  A  60 

25  A  20 

26  A  30 

27  A  45 

28  A  60 

29  A  20 

30  A  30 

31  A  45 

32  A  60 

33  A  20 

34  A  30 

35  A  45 

36  A  60 

37  A  20 

38  A  30 

39  A  45 

40  A  60 


0.863 

-1.485 

0.842 

-0.194 

0.881 

0.971 

0.925 

-1.475 

0.336 

-0.243 

0.651 

0.075 

0.566 

-0.549 

0.461 

-0.366 

1.352 

-1.110 

0.973 

1.743 

0.428 

0.201 

0.537 

-0.952 

0.619 

-0.567 

0.547 

-1.363 

0.606 

-0.628 

0.445 

0.232 

1.050 

-0.641 

0.994 

0.191 

1.301 

-0.721 

0.311 

-0.414 

0.512 

-1.278 

0.815 

1.088 

0.497 

0.077 

0.596 

1.848 

1.063 

0.291 

0.924 

0.233 

0.618 

0.096 

0.309 

0.594 

0.413 

0.462 

0.628 

-0.571 

0.516 

0.396 

0.302 

0.677 

1.284 

0.990 

0.420 

-0.887 

0.470 

-0.971 

0.304 

1.562 

0.725 

0.057 

1.280 

-0.519 

0.649 

-0.824 

0.773 

-0.234 

0.680 

-0.329 

0.679 

-1.041 

0.675 

-2.382 

0.746 

0.462 

0.304 

-2.235 

0.541 

-1.537 

0.487 

-0.881 

0.429 

-1.345 

0.962 

0.234 

0.733 

-3.353 

0.392 

-1.981 

0.483 

-0.572 

0.515 

-1.007 

0.458 

-0.108 

0.514 

-0.732 

0.420 

-2.012 

0.783 

-0.472 

0.732 

-1.419 

0.873 

-0.182 

0.303 

-1.864 

0.429 

-0.457 

0.648 

-2.591 

0.444 

-1.690 

0.519 

-3.651 

0.800 

-1.566 

0.698 

-1.505 

0.526 

-1.528 

0.307 

-2.867 

0.373 

-2.619 

0.521 

-0.836 

0.459 

-2.006 

0.301 

-2.991 

0.938 

-2.306 

0.368 

-0.957 

0.414 

-0.623 

0.307 

-3.866 

1.106 

-2.338 

0.555 

2.208 

0.542 

-0.457 

0.651 

-1.040 

141 
Table  50 

Analytical  Estimates  of  Unidimensional  Item  Parameters  for  MD40  Form  B 


Item    Form a,, 


Compensatory Noncompensatory 


1 

A,B 

20 

0.863 

-1.485 

2 

A,B 

30 

0.842 

-0.194 

3 

A,B 

45 

0.881 

0.971 

4 

A,B 

60 

0.925 

-1.475 

5 

A,B 

20 

0.336 

-0.243 

6 

A,B 

30 

0.651 

0.075 

7 

A,B 

45 

0.566 

-0.549 

8 

A,B 

60 

0.461 

-0.366 

9 

A,B 

20 

1.352 

-1.110 

10 

A,B 

30 

0.973 

1.743 

11 

A,B 

45 

0.428 

0.201 

12 

A,B 

60 

0.537 

-0.952 

41 

B 

20 

0.599 

1.046 

42 

B 

30 

1.135 

-0.774 

43 

B 

45 

1.059 

-0.327 

44 

B 

60 

0.638 

-1.192 

45 

B 

20 

0.299 

0.076 

46 

B 

30 

0.257 

-0.075 

47 

B 

45 

0.614 

0.180 

48 

B 

60 

1.201 

-1.109 

49 

B 

20 

0.729 

0.998 

50 

B 

30 

0.463 

-0.590 

51 

B 

45 

0.526 

1.359 

52 

B 

60 

0.666 

0.625 

53 

B 

20 

0.776 

0.057 

54 

B 

30 

0.554 

0.166 

55 

B 

45 

0.330 

-0.430 

56 

B 

60 

0.418 

0.252 

57 

B 

20 

0.470 

0.396 

58 

B 

30 

0.243 

-0.081 

59 

B 

45 

0.687 

-0.261 

60 

B 

60 

0.968 

0.433 

61 

B 

20 

0.523 

-1.864 

62 

B 

30 

0.509 

-0.707 

63 

B 

45 

0.544 

-0.552 

64 

B 

60 

0.473 

0.324 

65 

B 

20 

0.904 

-0.044 

66 

B 

30 

0.759 

-0.010 

67 

B 

45 

0.354 

-0.533 

68 

B 

60 

0.284 

2.117 

0.680  -0.329 

0.679  -1.041 

0.675  -2.382 

0.746  0.462 

0.304  -2.235 

0.541  -1.537 

0.487  -0.881 

0.429  -1.345 

0.962  0.234 

0.733  -3.353 

0.392  -1.981 

0.483  -0.572 

0.562  0.000 

0.140  1.178 

0.767  -0.730 

0.555  -0.149 

0.275  -2.802 

0.242  -2.701 

0.524  -1.627 

0.873  0.206 

0.609  -2.633 

0.404  -1.142 

0.476  -1.776 

0.578  -2.108 

0.626  -1.506 

0.648  -3.289 

0.309  -1.690 

0.397  -2.105 

0.417  -2.380 

0.230  -2.822 

0.569  -1.044 

0.759  -1.683 

0.430  0.213 

0.436  -0.903 

0.473  -0.914 

0.441  -2.046 

0.703  -1.274 

0.606  -1 .329 

0.328  -1.470 

0.291  -4.530 


142 
Table  51 


Descriptive  Statistics  for  Compensatory  MD10  Linkina  Items  with  Randomly 

Eauivalent  Groups 

Form  A 

Form  B 

Replication 

Mean 

SD 

Mean 

SD 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 


-0.196 

0.801 

-0.105 

0.788 

-0.117 

0.827 

-0.169 

0.763 

-0.166 

0.889 

-0.104 

0.761 

-0.164 

0.823 

-0.170 

0.754 

-0.125 

0.828 

-0.084 

0.753 

-0.071 

0.779 

-0.059 

0.750 

-0.128 

0.780 

-0.135 

0.779 

-0.083 

0.793 

-0.109 

0.778 

-0.167 

0.777 

-0.115 

0.753 

-0.132 

0.851 

-0.178 

0.751 

-0.107 

0.771 

-0.070 

0.835 

-0.197 

0.774 

-0.170 

0.846 

-0.141 

0.784 

-0.154 

0.806 

-0.146 

0.752 

-0.123 

0.833 

-0.097 

0.762 

-0.102 

0.740 

-0.057 

0.748 

-0.119 

0.765 

-0.127 

0.769 

-0.096 

0.774 

-0.102 

0.766 

-0.200 

0.772 

-0.121 

0.742 

-0.122 

0.834 

-0.167 

0.729 

-0.232 

0.740 

143 
Table  52 


Descriptive  Statistics  for  Compensatory  MD20  Linkinq  Items  with  Randomly 

Equivalent  Groups 

Form  A 

Form  B 

Replication 

Mean 

SD 

Mean 

SD 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 


-0.646 

0.885 

-0.666 

0.892 

-0.653 

0.949 

-0.601 

0.862 

-0.623 

0.823 

-0.600 

0.882 

-0.619 

0.918 

-0.674 

0.934 

-0.664 

0.907 

-0.631 

0.877 

-0.652 

0.972 

-0.751 

0.909 

-0.613 

0.831 

-0.630 

0.943 

-0.608 

0.866 

-0.698 

0.950 

-0.639 

0.904 

-0.671 

0.958 

-0.575 

0.869 

-0.682 

1.059 

-0.661 

0.824 

-0.652 

0.931 

-0.631 

0.836 

-0.643 

0.807 

-0.570 

0  843 

-0.651 

0.886 

-0.686 

0.889 

-0.660 

0.877 

-0.656 

0.858 

-0.651 

0.962 

-0.687 

0.870 

-0.653 

0.837 

-0.626 

0.895 

-0.665 

0.875 

-0.718 

0.924 

-0.647 

0.907 

-0.645 

0.905 

-0.239 

1.019 

-0.713 

1.045 

-0.651 

0.891 

144 


Table  53 


Descriptive  Statistics  for  Compensatory  MD30  Linking  Items  with  Randomly 


Eauivalent  Groups 

Form  A 

Form  B 

Replication 

Mean 

SD 

Mean 

SD 

1 

-0.012 

1.096 

-0.040 

1.181 

2 

-0.052 

1.130 

-0.070 

1.159 

3 

-0.111 

1.175 

-0.167 

1.141 

4 

-0.159 

1.105 

-0.013 

1.064 

5 

-0.045 

1.083 

-0.180 

1.125 

6 

-0.182 

1.102 

-0.143 

1.256 

7 

-0.147 

1.207 

-0.079 

1.141 

8 

-0.068 

1.111 

-0.063 

1.042 

9 

-0.074 

1.037 

-0.117 

1.084 

10 

-0.107 

1.077 

-0.156 

1.090 

11 

-0.203 

1.100 

-0.109 

1.212 

12 

-0.073 

1.157 

-0.096 

1.125 

13 

-0.119 

1.120 

-0.218 

1.180 

14 

-0.232 

1.206 

-0.114 

1.208 

15 

-0.113 

1.191 

-0.091 

1.149 

16 

-0.075 

1.126 

-0.100 

1.163 

17 

-0.104 

1.167 

-0.121 

1.134 

18 

-0.107 

1.126 

-0.080 

1.152 

19 

-0.102 

1.143 

-0.097 

1.112 

20 

-0.107 

1.122 

0.029 

1.099 

145 
Table  54 


Descriptive  Statistics  for  Comoensatorv  MD40  Linkina  Items  with  Randomly 

Eauivalent  Groups 

Form  A 

Form  B 

Replication 

Mean 

SD 

Mean 

SD 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 


-0.314 

1.071 

-0.323 

1.036 

-0.310 

1.008 

-0.329 

0.967 

-0.255 

0.954 

-0.360 

0.981 

-0.251 

0.993 

-0.288 

0.988 

-0.360 

1.035 

-0.242 

1.038 

-0.318 

1.048 

-0.272 

1.030 

-0.257 

1.030 

-0.274 

1.049 

-0.343 

0.956 

-0.264 

1.088 

-0.299 

1.044 

-0.284 

1.020 

-0.348 

1.072 

-0.245 

1.003 

-0.358 

0.993 

-0.320 

0.997 

-0.332 

0.954 

-0.258 

0.944 

-0.343 

0.989 

-0.221 

0.965 

-0.285 

0.973 

-0.348 

1.046 

-0.236 

1.049 

-0.320 

1.025 

-0.277 

1.008 

-0.275 

1.010 

-0.252 

1.015 

-0.337 

0.957 

-0.282 

1.048 

-0.291 

1.024 

-0.300 

1.001 

-0.331 

1.036 

-0.261 

0.990 

-0.317 

1.074 

146 


Table  55 

Descriptive  Statistics  for  Noncompensatory  MD10  Linking  Items 

Form  A  Form  B 


Replication  Mean  SD  Mean  SD 

1 

2 

3 

4 

5 

6 

7 

8 

9 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 


0.080 

0.785 

0.102 

0.777 

0.136 

0.768 

0.206 

0.804 

0.069 

0.777 

0.059 

0.778 

0.130 

0.812 

0.117 

0.779 

0.156 

0.800 

0.163 

0.860 

0.151 

0.776 

0.214 

0.775 

0.104 

0.830 

0.100 

0.816 

0.187 

0.846 

0.147 

0.781 

0.067 

0.763 

0.134 

0.888 

0.197 

0.840 

0.102 

0.739 

0.051 

0.756 

0.124 

0.744 

0.192 

0.817 

0.125 

0.765 

0.041 

0.779 

0.130 

0.796 

0.083 

0.745 

0.172 

0.792 

0.172 

0.834 

0.158 

0.780 

0.178 

0.764 

0.135 

0.827 

0.110 

0.785 

0.165 

0.833 

0.111 

0.785 

0.099 

0.764 

0.125 

0.858 

0.160 

0.825 

0.093 

0.732 

0.066 

0.765 

147 
Table  56 
Descriptive  Statistics  for  Noncompensatory  MD20  Linking  Items 


Form  A 

Form  B 

Replication 

Mean 

SD 

Mean 

SD 

1 

0.961 

1.386 

0.988 

1.374 

2 

0.969 

1.393 

0.924 

1.405 

3 

0.893 

1.356 

0.903 

1.211 

4 

0.905 

1.217 

1.047 

1.386 

5 

1.013 

1.327 

1.043 

1.336 

6 

0.995 

1.244 

0.974 

1.229 

7 

1.002 

1.355 

0.970 

1.396 

8 

0.889 

1.248 

0.966 

1.290 

9 

0.946 

1.308 

0.883 

1.090 

10 

0.894 

1.080 

1.055 

1.512 

11 

0.953 

1.362 

1.072 

1.770 

12 

1.042 

1.717 

1.001 

1.397 

13 

0.970 

1.411 

1.262 

1.784 

14 

1.184 

1.663 

0.748 

1.134 

15 

0.759 

1.145 

0.951 

1.323 

16 

0.936 

1.278 

1.016 

1.359 

17 

1.033 

1.314 

1.030 

1.532 

18 

0.987 

1.455 

1.078 

1.434 

19 

1.042 

1.445 

0.930 

1.174 

20 

0.926 

1.190 

1.008 

1.523 

148 
Table  57 
Descriptive  Statistics  for  Noncompensatory  MD30  Linking  Items 


Form  A 

Form  B 

Replication 

Mean 

SD 

Mean 

SD 

1 

0.147 

1.150 

0.170 

1.234 

2 

0.178 

1.236 

-0.019 

1.155 

3 

0.004 

1.136 

0.046 

1.226 

4 

0.041 

1.172 

0.099 

1.280 

5 

0.068 

1.237 

0.001 

1.179 

6 

-0.002 

1.151 

0.139 

1.280 

7 

0.160 

1.268 

0.050 

1.252 

8 

0.039 

1.238 

0.041 

1.222 

9 

0.043 

1.200 

0.204 

1.298 

10 

0.205 

1.241 

0.077 

1.144 

11 

0.062 

1.109 

0.097 

1.237 

12 

0.131 

1.200 

0.082 

1.228 

13 

0.112 

1.242 

0.082 

1.228 

14 

0.052 

1.171 

0.169 

1.320 

15 

0.146 

1.244 

0.106 

1.255 

16 

0.101 

1.230 

0.054 

1.186 

17 

0.032 

1.126 

0.065 

1.355 

18 

0.073 

1.284 

0.071 

1.125 

19 

0.065 

1.091 

0.125 

1.226 

20 

0.139 

1.227 

0.147 

1.183 

149 
Table  58 
Descriptive  Statistics  for  Noncompensatory  MD40  Linking  Items 


Form  A 

Form  B 

Replication 

Mean 

SD 

Mean 

SD 

1 

0.336 

1.115 

0.321 

1.246 

2 

0.341 

1.203 

0.369 

1.074 

3 

0.384 

1.090 

0.286 

1.007 

4 

0.330 

1.060 

0.315 

1.092 

5 

0.324 

1.097 

0.298 

1.207 

6 

0.303 

1.212 

0.325 

1.051 

7 

0.335 

1.070 

0.416 

1.119 

8 

0.431 

1.128 

0.279 

1.061 

9 

0.291 

1.083 

0.397 

1.095 

10 

0.403 

1.080 

0.270 

1.041 

11 

0.323 

1.084 

0.448 

1.024 

12 

0.468 

1.089 

0.295 

1.000 

13 

0.316 

1.010 

0.362 

1.076 

14 

0.398 

1.111 

0.265 

1.043 

15 

0.282 

1.048 

0.336 

1.091 

16 

0.370 

1.118 

0.303 

1.106 

17 

0.315 

1.151 

0.306 

1.076 

18 

0.324 

1.127 

0.277 

1.151 

19 

0.303 

1.163 

0.301 

1.027 

20 

0.297 

1.039 

0.311 

1.039 

Table  59 

Descriptive  Statistics  for  Compensatory  Linking  Items  with  Noneauivalent 
Examinee  Groups 


150 


Form  A 
(Average  Ability) 


Form  B 
(Low  Ability) 


Replication 


Mean 


S.D. 


Mean 


S.D. 


MD10 


1 

-0.20 

0.79 

2 

-0.10 

0.79 

3 

-0.12 

0.80 

4 

-0.17 

0.81 

5 

-0.17 

0.89 

0.90 

1.15 

0.93 

1.25 

1.03 

1.39 

1.01 

1.37 

1.05 

1.28 

MD20 


1 
2 
3 
4 
5 

MD30 

1 
2 
3 
4 
5 


-0.65 

0.85 

-0.67 

0.85 

-0.65 

0.85 

-0.60 

0.82 

-0.62 

0.82 

-0.01 

1.08 

-0.05 

1.09 

-0.11 

1.09 

-0.16 

1.07 

-0.04 

1.08 

0.10 

1.31 

0.11 

1.39 

0.04 

1.32 

0.13 

1.39 

0.39 

1.83 

0.88 

1.53 

0.77 

1.62 

0.71 

1.44 

0.63 

1.47 

0.73 

1.37 

MD40 


1 
2 
3 
4 
5 


■0.31 

0.97 

■0.32 

0.96 

•0.31 

0.95 

•0.33 

0.94 

0.25 

0.95 

0.57 

1.38 

0.43 

1.37 

0.54 

1.37 

0.53 

1.51 

0.54 

1.51 

REFERENCES 

Ackerman,  T.  A.  (1988,  April).  An  explanation  of  differential  item 
functioning  from  a  multidimensional  perspective.  A  paper  presented  at  the 
annual  meeting  of  the  American  Educational  Research  Association,  New 
Orleans. 

Ackerman,  T.  A.  (1989).  Unidimensional  IRT  calibration  of  compensatory 
and  noncompensatory  multidimensional  items.  Applied  Psychological 
Measurement.  13,  113-127. 

Ackerman,  T.  A.  (1992).  A  didactic  explanation  of  item  bias,  item  impact, 
and  item  validity  from  a  multidimensional  perspective.  Journal  of  Educational 
Measurement,  29,  67-91. 

Angoff,  W.  H.  (1971).  Norms,  scales,  and  equivalent  scores.  In  R.  L. 
Thorndike  (Ed.),  Educational  measurement  (2nd  ed.).  Washington,  DC: 
American  Council  on  Education. 

Angoff,  W.  H.  (1988).  Proposals  for  theoretical  and  applied  development 
in  measurement.  Applied  Measurement  in  Education,  i,  215-222. 

Angoff,  W.  H.,  &  Cowell,  W.  R.  (1986).  A  examination  of  the  assumption 
that  the  equating  of  parallel  forms  of  a  test  is  population-independent.  Journal 
of  Educational  Measurement,  23,  327-345. 

Ansley,  T.  N.,  &  Forsyth,  R.  A.  (1985).  An  examination  of  the 
characteristics  of  unidimensional  IRT  parameter  estimates  derived  from  two- 
dimensional  data.  Applied  Psychological  Measurement.  9,  37-48. 

Baker,  F.  B.  (1990).  Some  observations  on  the  metric  of  PC-BILOG 
results.  Applied  Psychological  Measurement.  14.  139-150. 

Baker,  F.  B.,  Al-Karni,  A,  &  Al-Dosary,  I.  M.  (1991).  EQUATE:  A 
computer  program  for  the  test  characteristic  curve  method  of  IRT  eguating. 
Madison,  Wl:  University  of  Wisconsin. 

Birnbaum,  A.  (1968).  Some  latent  trait  models  and  their  use  in  inferring 
an  examinee's  ability.  In  F.  M.  Lord  and  M.  R.  Novick,  Statistical  theories  of 
mental  test  scores  (pp.  397-479).  Reading,  MA:  Addison-Wesley. 

151 


152 


Bock,  R.  D.,  Gibbons,  R.,  &  Muraki,  E.  (1988).  Full-information  item 
factor  analysis.  Applied  Psychological  Measurement.  12,  261-280. 

Braun,  H.  I.,  &  Holland,  P.  W.  (1982).  Observed-score  test  equating:  A 
mathematical  analysis  of  some  ETS  equating  procedures.  In  P.  W.  Holland  & 
D.  B.  Rubin  (Eds.),  Test  equating  (pp.  9-49).  New  York:  Academic  Press. 

Brennan,  R.  L,  &  Kolen,  M.  J.  (1987).  Some  practical  issues  in 
equating.  Applied  Psychological  Measurement.  11,  279-290. 

Buhr,  D.  C,  &  Algina,  J.  (1986,  April).  A  comparison  of  item  parameter 
estimates  and  ability  parameter  estimates  obtained  by  different  methods 
implemented  by  BILOG.  Paper  presented  at  the  annual  meeting  of  the 
American  Educational  Research  Association,  San  Francisco. 

Camilli,  G.  (1992).  A  conceptual  analysis  of  differential  item  functioning 
in  terms  of  a  multidimensional  item  response  model.  Applied  Psychological 
Measurement.  16,  129-147. 

Camilli,  G,  Wang,  M,  &  Fesq,  J.  (1995).  The  effects  of  dimensionality 
on  equating  the  Law  School  Admission  Test.  Journal  of  Educational 
Measurement.  32,  79-96. 

Cook,  L.  L,  &  Eignor,  D.  R.  (1983,  April).  An  investigation  of  the 
feasibility  of  applying  item  response  theory  to  equate  achievement  tests.  Paper 
presented  at  the  annual  meeting  of  the  American  Educational  Research 
Association,  Montreal. 

Cook,  L.  L,  &  Eignor,  D.  R.  (1988).  Using  item  response  theory  in  test 
score  equating.  International  Journal  of  Educational  Research.  23.  161-173. 

Cook,  L.  L,  Eignor,  D.  R.,  &  Taft,  H.  L.  (1988).  A  comparative  study  of 
the  effects  of  recency  of  instruction  on  the  stability  of  IRT  and  conventional  item 
parameter  estimates.  Journal  of  Educational  Measurement.  25.  31-45. 

Cook,  L.  L,  &  Petersen,  N.  S.  (1987).  Problems  related  to  the  use  of 
conventional  and  item  response  theory  equating  methods  in  less  than  optimal 
circumstances.  Applied  Psychological  Measurement.  H,  225-244. 

Crocker,  L,  &  Algina,  J.  (1986).  Introduction  to  classical  and  modern 
test  theory.  New  York:  Holt,  Rinehart,  and  Winston. 


153 

Doody-Bogan,  E.,  &  Yen,  W.  M.  (1983,  April).  Detecting 
multidimensionalitv  and  examining  Its  effect  on  vertical  equating  with  the  three- 
parameter  logistic  model.  A  paper  presented  at  the  annual  meeting  of  the 
American  Educational  Research  Association,  Montreal. 

Dorans,  N.  J.  (1990).  Equating  methods  and  sampling  designs.  Applied 
Measurement  in  Education,  3,  3-17. 

Dorans,  N.  J,  &  Kingston,  N.  M.  (1985).  The  effects  of  violations  of 
unidimensionality  on  the  estimation  of  item  and  ability  parameters  and  on  the 
item  response  theory  equating  of  the  GRE  verbal  scale.  Journal  of  Educational 
Measurement.  22,  249-262. 

Drasgow,  F.,  &  Parsons,  C.  K.  (1983).  Application  of  unidimensional  item 
response  theory  models  to  multidimensional  data.  Applied  Psychological 
Measurement.  7,  189-199. 

Hambleton,  R.  K.,  &  Swaminathan,  H.  (1985).  Item  response  theory. 
Boston:  Kluwer-Nijhoff. 

Harris,  D.  J.,  &  Kolen,  M.  J.  (1986).  Effect  of  examinee  group  on 
equating  relationships.  Applied  Psychological  Measurement,  10,  35-43. 

Harrison,  D.  A.  (1986).  Robustness  of  I RT  parameter  estimation  to 
violation  of  the  unidimensionality  assumption.  Journal  of  Educational  Statistics, 
11,91-115. 

Hattie,  J.  (1985).  Methodology  review:  Assessing  unidimensionality  of 
tests  and  items.  Applied  Psychological  Measurement.  9.  139-164. 

Hills,  J.  R.,  Subhiyah,  R.  G.,  &  Hirsch,  T.  M.  (1988).  Equating  minimum 
competency  tests:  Comparison  of  methods.  Journal  of  Educational 
Measurement.  25,  221-231. 

Hirsch,  T.  M.  (1989).  Multidimensional  equating.  Journal  of  Educational 
Measurement.  26,  337-349. 

Hirsch,  T.  M,  &  Miller,  T.  R.  (1991).  Comparison  of  rotational  methods 
applied  to  multidimensional  item  response  theory  item-parameter  estimates. 
Paper  presented  at  the  annual  meeting  of  the  American  Educational  Research 
Association,  Chicago. 

Holland,  P.  W.,  &  Rubin,  D.  B.  (Eds).  (1982).  Test  eouating.  New 
York:  Academic  Press. 


154 

Hulin,  C.  L,  Lissak,  R.  J.,  &  Drasgow,  F.  (1982).  Recovery  of  two- and 
three-parameter  item  characteristic  curves:  A  monte  carlo  study.  Applied 
Psychological  Measurement.  6,  249-260. 

Kingston,  N.  M.,  &  Dorans,  N.  J.  (1984).  Item  location  effects  and  their 
implications  for  IRT  equating  and  adaptive  testing.  Applied  Psychological 
Measurement,  8,  147-154. 

Klein,  L.  W.,  &  Jarjoura,  D.  (1985).  The  importance  of  content 
representation  for  common-item  equating  with  nonrandom  groups.    Journal  of 
Educational  Measurement,  22,  197-206. 

Klein,  L.  W.,  &  Kolen,  M.  J.  (1985,  April).  Effects  of  number  of  common 
items  in  common-item  eouating  with  nonrandom  groups.  Paper  presented  at  the 
annual  meeting  of  the  American  Educational  Research  Association,  Chicago. 

Kolen,  M.  J.  (1981).  Comparison  of  traditional  and  item  response  theory 
methods  for  equating  tests.  Journal  of  Educational  Measurement,  18,  1-11. 

Kolen,  M.  J.,  &  Whitney,  D.  R.  (1982).  Comparison  of  four  procedures 
for  equating  the  tests  of  General  Educational  Development.  Journal  of 
Educational  Measurement.  19,  279-293. 

Livingston,  S.  A.,  Dorans,  N.  J.,  &  Wright,  N.  K.  (1990).  What 
combination  of  sampling  and  equating  methods  works  best?  Applied 
Measurement  in  Education.  3,  73-95. 

Lord,  F.  M.  (1968).  An  analysis  of  the  Verbal  Scholastic  Aptitude  Test 
using  Birnbaum's  three-parameter  logistic  model.  Educational  and 
Psychological  Measurement,  28,  989-1020. 

Lord,  F.  M.  (1980).  Applications  of  item  response  theory  to  practical 
testing  problems.  Hillsdale,  NJ:  Lawrence  Erlbaum. 

Marco,  G.  L,  Petersen,  N.  S„  &  Stewart,  E.  E.  (1983).  A  test  of  the 
adequacy  of  curvilinear  score  equating  models.  In  D.  L.  Weiss  (Ed),  New 
horizons  in  testing:  Latent  trait  theory  and  computerized  adaptive  testing  (pp. 
147-177).  New  York:  Academic. 

Mislevy,  R.  J.,  &  Bock,  R.  D.  (1987).  PC-BILOG  maximum  likelihood 
item  analysis  and  test  scoring:  Logistic  model.  Mooresville,  IN:  Scientific 
Software. 

Mislevy,  R.  J.,  &  Bock,  R.  D.  (1990).  BILOG  3:  Item  analysis  and  test 
scoring  with  binary  logistic  models.  Mooresville,  IN:  Scientific  Software. 


155 


Mislevy,  R.  J.,  &  Stocking,  M.  L.  (1989).  A  consumer's  guide  to  LOGIST 
and  BILOG.  Applied  Psychological  Measurement.  13,  57-75. 

Nandakumar,  R.  (1991).  Traditional  dimensionality  versus  essential 
dimensionality.  Journal  of  Educational  Measurement.  28.  99-117, 

Nandakumar,  R.  (1994).  Assessing  dimensionality  of  a  set  of  item 
responses-comparison  of  different  approaches.  Journal  of  Educational 
Measurement.  3_1,  17-35. 

Nandakumar,  R.  &  Stout,  W.  (1993).  Refinements  of  Stout's  procedure 
for  assessing  latent  trait  unidimensionality.  Journal  of  Educational  Statistics.  18, 
41-68. 

Oshima,  T.  C,  &  Miller,  M.  D.  (1990).  Multidimensionality  and  IRT- 
based  item  invariance  indices:  The  effect  of  between  group  variation  in  trait 
correlation.  Journal  of  Educational  Measurement.  27,  273-283. 

Petersen,  N.  S.,  Cook,  L.  L,  &  Stocking,  M.  L.  (1983).  IRT  versus 
conventional  equating  methods:  A  comparative  study  of  scale  stability.  Journal 
of  Educational  Statistics.  8,  137-156. 

Petersen,  N.  S.,  Kolen,  M.,  &  Hoover,  H.  D.  (1989).  Scaling,  norming, 
and  equating.  In  R.  L.  Linn  (Ed),  Educational  measurement  (3rd  ed.). 
Washington,  DC:  American  Council  on  Education. 

Quails,  A.  L,  &  Ansley,  T.  N.  (1985,  April).  A  comparison  of  item  and 
ability  parameter  estimates  derived  from  LOGIST  and  BILOG.  Paper  presented 
at  the  annual  meeting  of  the  National  Council  on  Measurement  in  Education, 
Chicago. 

Reckase,  M.  D.  (1979).  Unifactor  latent  trait  models  applied  to 
multifactor  tests:  Results  and  implications.  Journal  of  Educational  Statistics,  4, 
207-230. 

Reckase,  M.  D.  (1985).  The  difficulty  of  test  items  that  measure  more 
than  one  ability.  Applied  Psychological  Measurement.  9.  401-412. 

Reckase,  M.  D.,  Ackerman,  T.  A.,  &  Carlson,  J.  E.  (1988).  Building  a 
unidimensional  test  using  multidimensional  items.  Journal  of  Educational 
Measurement.  25,  193-203. 

Ree,  M.  J.  (1979).  Estimating  item  characteristic  curves.  Applied 
Psychological  Measurement.  3,  371-385. 


156 


Roznowski,  M,  Tucker,  L,  &  Humphreys,  L.  (1991).  Three  approaches 
to  determining  the  dimensionality  of  binary  items.  Applied  Psychological 
Measurement,  15,  109-127. 

SAS  Institute  Inc.  (1990).  Statistical  analysis  system  (6.07).  Cary,  NC: 
SAS  Institute. 

Skaggs,  G,  &  Lissitz,  R.  W.  (1986a).  IRT  test  equating:  Relevant  issues 
and  a  review  of  recent  research.  Review  of  Educational  Research,  56,  495-529. 

Skaggs,  G,  &  Lissitz,  R.  W.  (1986b).  An  exploration  of  the  robustness 
of  four  test  equating  models.    Applied  Psychological  Measurement.  10,  SOS- 
SI?. 

Stocking,  M.  L,  &  Lord,  F.  M.  (1983).  Developing  a  common  metric  in 
item  response  theory.  Applied  Psychological  Measurement.  7,  201-210. 

Stout,  W.  F.  (1990).  A  new  item  response  theory  modeling  approach 
with  applications  to  unidimensionality  assessment  and  ability  estimation. 
Psvchometrika.  55,  293-325. 

Swaminathan,  H,  &Gifford,  J.  A.  (1983).  Estimation  of  parameters  in 
the  three-parameter  latent  trait  model.  In  D.  Weiss  (Ed),  New  horizons  in 
testing,  (pp.  13-30).  New  York:  Academic  Press. 

Swaminathan,  H,  &  Gifford,  J.  A.  (1985).  Bayesian  estimation  in  the 
two-parameter  model.  Psvchometrika,  50,  349-364. 

Sympson,  J.  B.  (1978).  A  model  for  testing  with  multidimensional  items. 
In  D.  J.  Weiss  (Ed),  Proceedings  of  the  1982  item  response  theory/ 
computerized  adaptive  testing  conference  (pp.  151-177).  Minneapolis,  MN: 
University  of  Minnesota,  Department  of  Psychology. 

Wang,  M.  (1986,  April).  Fitting  a  unidimensional  model  to 
multidimensional  item  response  data.  Paper  presented  at  the  ONR  contractors 
conference,  Gatlinburg,  TN. 

Way,  W.  D.,  Ansley,  T.  N.,  &  Forsyth,  R.  A.  (1988).  The  comparative 
effects  of  compensatory  and  noncompensatory  two-dimensional  data  on 
unidimensional  IRT  estimates.  Applied  Psychological  Measurement.  12,  239- 
252. 


157 

Wingersky,  M.  S.,  Cook,  L.  L,  &  Eignor,  D.  R.  (1986,  April).  Specifying 
the  characteristics  of  linking  items  used  for  item  response  theory  item 
calibration.  Paper  presented  at  the  annual  meeting  of  the  American  Educational 
Research  Association,  San  Francisco. 

Wingersky,  M.  S.,  &  Lord,  F.  M.  (1984).  An  investigation  of  methods  for 
reducing  sampling  error  in  certain  IRT  procedures.  Applied  Psychological 
Measurement,  8,  347-364. 

Wood,  R.  L,  Wingersky,  M.  S,  &  Lord,  F.  M.  (1976).  LOGIST  -  A 
computer  program  for  estimating  examinee  ability  and  item  characteristic  curve 
parameters.  (Research  Memorandum  76-6).  Princeton,  NJ:  Educational 
Testing  Service. 

Yen,  W.  (1984).  Effects  of  local  item  dependence  on  the  fit  and 
equating  performance  of  the  three-parameter  logistic  model.  Applied 
Psychological  Measurement,  8,  125-145. 

Yen,  W.  (1987).  A  comparison  of  the  efficiency  and  accuracy  of  BILOG 
and  LOGIST.  Psvchometrika,  52,  275-291. 


BIOGRAPHICAL  SKETCH 

Patricia  Duffy  Spence  was  born  in  Plainfield,  New  Jersey,  where  she 
continued  to  live  until  graduating  from  high  school.  She  moved  with  her  family 
to  Daytona  Beach,  Florida,  and  received  her  Bachelor  of  Arts  degree  in 
elementary  education  from  the  University  of  Central  Florida,  Orlando.  While 
teaching  elementary  and  middle  school  children  in  Volusia  County,  Patricia 
completed  requirements  for  the  Master  of  Education  degree  in  educational 
leadership,  also  at  the  University  of  Central  Florida. 

After  completing  her  doctoral  classes  at  the  University  of  Florida  in 
Gainesville,  Patricia  moved  to  San  Antonio  to  work  as  Project  Director  for 
Customized  State  Testing  Programs  at  The  Psychological  Corporation.  She 
returned  to  Florida  where  she  is  currently  employed  as  the  Research, 
Evaluation,  and  School  Improvement  Specialist  for  the  Florida  Region  III  Title  I 
Technical  Assistance  Center  in  Orlando.  She  also  teaches  graduate  research 
and  measurement  courses  as  an  adjunct  at  the  University  of  Central  Florida. 

Patricia  and  her  husband,  Verne  Spence,  have  a  daughter  Cindy, 
currently  an  undergraduate  majoring  in  art,  also  at  the  University  of  Florida. 


158 


I  certify  that  I  have  read  this  study  and  that  in  my  opinion  it  conforms  to 
acceptable  standards  of  scholarly  presentation  and  is  fully  adequate,  in  scope 
and  quality,  as  a  dissertation  for  the  degree  of  Doctor  of  Philosophy. 


M.  David  Miller,  Chair 
Associate  Professor  of 
Foundations  of  Education 


I  certify  that  I  have  read  this  study  and  that  in  my  opinion  it  conforms  to 
acceptable  standards  of  scholarly  presentation  and  is  fully  adequate,  in  scope 
and  quality,  as  a  dissertation  for  the  degree  of  Doctor  of  Philosophy. 


>aa-Q- 


Jarr\es  J.  Algina 
Projtessor  of  Fourtdalioi 
Education 


I  certify  that  I  have  read  this  study  and  that  in  my  opinion  it  conforms  to 
acceptable  standards  of  scholarly  presentation  and  is  fully  adequate,  in  scope 
and  quality,  as  a  dissertation  for  the  degree  of  Doctor  of  Philosophy. 


Linda 


*s^*£u  y^j  <&a*^£^^_ 


Linda  M.  Crocker 
Professor  of  Foundations  of 
Education 


I  certify  that  I  have  read  this  study  and  that  in  my  opinion  it  conforms  to 
acceptable  standards  of  scholarly  presentation  and  is  fully  adequate,  in  scope 
and  quality,  as  a  dissertation  for  the  degree  of  Doctor  of  Philosophy. 


; 


Xj^jUtf 


Sn  Newell 
Professor  of  Foundations  of 
Education 


I  certify  that  I  have  read  this  study  and  that  in  my  opinion  it  conforms  to 
acceptable  standards  of  scholarly  presentation  and  is  fully  adequate,  in  scope 
and  quality,  as  a  dissertation  for  the  degree  of  Doctor  of  Philosophy. 


Paul  George 
Professor  of  Educational 
Leadership 


This  dissertation  was  submitted  to  the  Graduate  Faculty  of  the  College  of 
Education  and  to  the  Graduate  School  and  was  accepted  as  partial  fulfillment  of 
the  requirements  for  the  degree  of  Doctor  of  Philosophy. 


^J%^^   (UiUU^Lj 


May,  1996 

Chairman,  Foundations  of 
Education 


1^~^ 


of  Educatio 


Dun,  Collegerof  Education^^ 


Dean,  Graduate  School 


. 


,LD 

1780 

1  r\n/ 


.vS74f 


UNIVERSITY  OF  FLORIDA 

llWlllllllIllliilll 

3  1262  08565  0076 


